首页 > 最新文献

Bioinformatics advances最新文献

英文 中文
Adding highly variable genes to spatially variable genes can improve cell type clustering performance in spatial transcriptomics data. 在空间可变基因中加入高可变基因可以提高空间转录组学数据中细胞类型聚类的性能。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-20 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf285
Yijun Li, Stefan Stanojevic, Bing He, Zheng Jing, Qianhui Huang, Jian Kang, Lana X Garmire

Motivation: Spatial transcriptomics has allowed researchers to analyze transcriptome data in its tissue sample's spatial context. Various methods have been developed for detecting spatially variable genes (SV genes), whose gene expression over the tissue space shows strong spatial autocorrelation. Such genes are often used to define clusters in cells or spots downstream. However, highly variable (HV) genes, whose quantitative gene expressions show significant variation from cell to cell, are conventionally used in clustering analyses.

Results: In this report, we investigate whether adding highly variable genes to spatially variable genes can improve the cell type clustering performance in spatial transcriptomics data. We tested the clustering performance of HV genes, SV genes, and the union of both gene sets (concatenation) on over 50 real spatial transcriptomics datasets across multiple platforms, using a variety of spatial and non-spatial metrics. Our results show that combining HV genes and SV genes can improve overall cell-type clustering performance.

Availability and implementation: All data and code used in this evaluation study can be found in the following link: https://github.com/lanagarmire/ST_benchmark.

动机:空间转录组学允许研究人员在其组织样本的空间背景下分析转录组数据。空间可变基因(SV基因)在组织空间上的表达具有很强的空间自相关性。这类基因通常用来定义细胞或下游的斑点中的集群。然而,高变量(HV)基因,其定量基因表达在细胞间表现出显著差异,通常用于聚类分析。结果:在本报告中,我们研究了在空间可变基因中加入高可变基因是否可以提高空间转录组学数据中的细胞类型聚类性能。我们使用各种空间和非空间指标,在多个平台上的50多个真实空间转录组学数据集上测试了HV基因、SV基因以及这两个基因集的联合(串联)的聚类性能。我们的研究结果表明,结合HV基因和SV基因可以提高整体细胞型聚类性能。可用性和实施:本评价研究中使用的所有数据和代码可在以下链接中找到:https://github.com/lanagarmire/ST_benchmark。
{"title":"Adding highly variable genes to spatially variable genes can improve cell type clustering performance in spatial transcriptomics data.","authors":"Yijun Li, Stefan Stanojevic, Bing He, Zheng Jing, Qianhui Huang, Jian Kang, Lana X Garmire","doi":"10.1093/bioadv/vbaf285","DOIUrl":"10.1093/bioadv/vbaf285","url":null,"abstract":"<p><strong>Motivation: </strong>Spatial transcriptomics has allowed researchers to analyze transcriptome data in its tissue sample's spatial context. Various methods have been developed for detecting spatially variable genes (SV genes), whose gene expression over the tissue space shows strong spatial autocorrelation. Such genes are often used to define clusters in cells or spots downstream. However, highly variable (HV) genes, whose quantitative gene expressions show significant variation from cell to cell, are conventionally used in clustering analyses.</p><p><strong>Results: </strong>In this report, we investigate whether adding highly variable genes to spatially variable genes can improve the cell type clustering performance in spatial transcriptomics data. We tested the clustering performance of HV genes, SV genes, and the union of both gene sets (concatenation) on over 50 real spatial transcriptomics datasets across multiple platforms, using a variety of spatial and non-spatial metrics. Our results show that combining HV genes and SV genes can improve overall cell-type clustering performance.</p><p><strong>Availability and implementation: </strong>All data and code used in this evaluation study can be found in the following link: https://github.com/lanagarmire/ST_benchmark.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf285"},"PeriodicalIF":2.8,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12809558/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145999833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SiaScoreNet: a siamese neural network-based model integrating prediction scores for HLA-peptide interaction prediction. SiaScoreNet:一个基于暹罗神经网络的模型,集成了hla -肽相互作用预测的预测分数。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-19 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf248
Mahsa Saadat, Fatemeh Zare-Mirakabad, Milad Besharatifard

Motivation: Cancer immunotherapy uses the immune system to recognize and eliminate tumor cells by presenting tumor antigens through Human Leukocyte Antigen (HLA) molecules. Accurate prediction of HLA-peptide interactions is essential for personalized immunotherapy development. Allele-specific models achieve high accuracy and handle variable peptide lengths but require separate training for each allele, limiting scalability to rare or unseen HLAs. Pan-specific models generalize across multiple alleles and match or surpass allele-specific methods. Ensemble methods improve prediction by combining outputs from multiple predictors, often via linear combinations, though nonlinear strategies may better capture HLA-peptide complexities.We propose SiaScoreNet, a three-step predictive pipeline enhancing HLA-peptide interaction prediction. First, ESM, a pretrained transformer-based protein language model, embeds HLA and peptide sequences into fixed-length representations, accommodating varying sequence lengths. Second, we integrate predicted scores from state-of-the-art models into a comprehensive feature vector. Third, a nonlinear ensemble strategy combines features, capturing complex dependencies and boosting performance.

Results: Benchmark evaluations show SiaScoreNet outperforms existing models in accuracy, comparable to TransPHLA, BigMHC, and CapHLA. Recent models prioritize recall over precision, valuable for identifying potential binders but resource-intensive. SiaScoreNet offers improved performance and runtime efficiency compared to these models, evaluated against HPV viruses for HLA-peptide prediction.

Availability and implementation: The data and source code for prediction and experiments presented in this study is publicly available in the SiaScoreNet repository hosted on GitHub: https://github.com/CBRC-lab/SiaScoreNet.

动机:癌症免疫疗法利用免疫系统通过人类白细胞抗原(HLA)分子呈递肿瘤抗原,从而识别和消灭肿瘤细胞。准确预测hla -肽相互作用对于个性化免疫治疗的发展至关重要。等位基因特异性模型具有很高的准确性和处理可变的肽长度,但需要对每个等位基因进行单独的训练,限制了罕见或未见的hla的可扩展性。泛特异性模型在多个等位基因之间进行推广,匹配或超越等位基因特异性方法。集成方法通过组合多个预测器的输出来改进预测,通常是通过线性组合,尽管非线性策略可能更好地捕获hla肽的复杂性。我们提出SiaScoreNet,一个三步预测管道,增强hla -肽相互作用预测。首先,ESM是一种预训练的基于转换器的蛋白质语言模型,它将HLA和肽序列嵌入到固定长度的表示中,以适应不同的序列长度。其次,我们将最先进模型的预测分数集成到一个综合特征向量中。第三,非线性集成策略结合特征,捕获复杂的依赖关系并提高性能。结果:基准评估表明,SiaScoreNet在准确性上优于现有模型,可与TransPHLA、BigMHC和CapHLA相媲美。最近的模型优先考虑召回而不是精度,这对识别潜在的粘合剂很有价值,但需要耗费大量资源。与这些模型相比,SiaScoreNet提供了更好的性能和运行时效率,用于HPV病毒的hla肽预测。可用性和实现:本研究中提出的预测和实验的数据和源代码在GitHub上的SiaScoreNet存储库中公开提供:https://github.com/CBRC-lab/SiaScoreNet。
{"title":"SiaScoreNet: a siamese neural network-based model integrating prediction scores for HLA-peptide interaction prediction.","authors":"Mahsa Saadat, Fatemeh Zare-Mirakabad, Milad Besharatifard","doi":"10.1093/bioadv/vbaf248","DOIUrl":"10.1093/bioadv/vbaf248","url":null,"abstract":"<p><strong>Motivation: </strong>Cancer immunotherapy uses the immune system to recognize and eliminate tumor cells by presenting tumor antigens through Human Leukocyte Antigen (HLA) molecules. Accurate prediction of HLA-peptide interactions is essential for personalized immunotherapy development. Allele-specific models achieve high accuracy and handle variable peptide lengths but require separate training for each allele, limiting scalability to rare or unseen HLAs. Pan-specific models generalize across multiple alleles and match or surpass allele-specific methods. Ensemble methods improve prediction by combining outputs from multiple predictors, often via linear combinations, though nonlinear strategies may better capture HLA-peptide complexities.We propose <i>SiaScoreNet</i>, a three-step predictive pipeline enhancing HLA-peptide interaction prediction. First, ESM, a pretrained transformer-based protein language model, embeds HLA and peptide sequences into fixed-length representations, accommodating varying sequence lengths. Second, we integrate predicted scores from state-of-the-art models into a comprehensive feature vector. Third, a nonlinear ensemble strategy combines features, capturing complex dependencies and boosting performance.</p><p><strong>Results: </strong>Benchmark evaluations show <i>SiaScoreNet</i> outperforms existing models in accuracy, comparable to TransPHLA, BigMHC, and CapHLA. Recent models prioritize recall over precision, valuable for identifying potential binders but resource-intensive. <i>SiaScoreNet</i> offers improved performance and runtime efficiency compared to these models, evaluated against HPV viruses for HLA-peptide prediction.</p><p><strong>Availability and implementation: </strong>The data and source code for prediction and experiments presented in this study is publicly available in the <i>SiaScoreNet</i> repository hosted on GitHub: https://github.com/CBRC-lab/SiaScoreNet.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf248"},"PeriodicalIF":2.8,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12641608/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145607747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MxlPy-Python package for mechanistic learning and hybrid modelling in life science. MxlPy-Python包,用于生命科学中的机械学习和混合建模。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-18 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf294
Marvin van Aalst, Tim Nies, Tobias Pfennig, Anna Matuszyńska

Summary: Recent advances in artificial intelligence have accelerated the adoption of machine learning (ML) in biology, enabling powerful predictive models across diverse applications. However, in scientific research, the need for interpretability and mechanistic insight remains crucial. To address this, we introduce MxlPy, a Python package that combines mechanistic modelling with ML to deliver explainable, data-informed solutions. MxlPy facilitates mechanistic learning, an emerging approach that integrates the transparency of mathematical models with the flexibility of data-driven methods. By streamlining tasks such as data integration, model formulation, output analysis, and surrogate modelling, MxlPy enhances the modelling experience without sacrificing interpretability. Designed for both computational biologists and interdisciplinary researchers, it supports the development of accurate, efficient, and explainable models, making it a valuable tool for advancing bioinformatics, systems biology, and biomedical research.

Availability and implementation: MxlPy source code is freely available at https://github.com/Computational-Biology-Aachen/MxlPy. The full documentation with features and examples can be found here https://computational-biology-aachen.github.io/MxlPy.

摘要:人工智能的最新进展加速了机器学习(ML)在生物学中的应用,为各种应用提供了强大的预测模型。然而,在科学研究中,对可解释性和机制洞察力的需求仍然至关重要。为了解决这个问题,我们引入了MxlPy,这是一个Python包,它将机械建模与ML相结合,以提供可解释的、数据知情的解决方案。MxlPy促进了机械学习,这是一种将数学模型的透明性与数据驱动方法的灵活性相结合的新兴方法。通过简化数据集成、模型制定、输出分析和代理建模等任务,MxlPy在不牺牲可解释性的情况下增强了建模体验。它为计算生物学家和跨学科研究人员设计,支持开发准确、高效和可解释的模型,使其成为推进生物信息学、系统生物学和生物医学研究的宝贵工具。可用性和实现:MxlPy源代码可在https://github.com/Computational-Biology-Aachen/MxlPy免费获得。包含特性和示例的完整文档可以在这里找到https://computational-biology-aachen.github.io/MxlPy。
{"title":"MxlPy-Python package for mechanistic learning and hybrid modelling in life science.","authors":"Marvin van Aalst, Tim Nies, Tobias Pfennig, Anna Matuszyńska","doi":"10.1093/bioadv/vbaf294","DOIUrl":"10.1093/bioadv/vbaf294","url":null,"abstract":"<p><strong>Summary: </strong>Recent advances in artificial intelligence have accelerated the adoption of machine learning (ML) in biology, enabling powerful predictive models across diverse applications. However, in scientific research, the need for interpretability and mechanistic insight remains crucial. To address this, we introduce MxlPy, a Python package that combines mechanistic modelling with ML to deliver explainable, data-informed solutions. MxlPy facilitates mechanistic learning, an emerging approach that integrates the transparency of mathematical models with the flexibility of data-driven methods. By streamlining tasks such as data integration, model formulation, output analysis, and surrogate modelling, MxlPy enhances the modelling experience without sacrificing interpretability. Designed for both computational biologists and interdisciplinary researchers, it supports the development of accurate, efficient, and explainable models, making it a valuable tool for advancing bioinformatics, systems biology, and biomedical research.</p><p><strong>Availability and implementation: </strong>MxlPy source code is freely available at https://github.com/Computational-Biology-Aachen/MxlPy. The full documentation with features and examples can be found here https://computational-biology-aachen.github.io/MxlPy.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf294"},"PeriodicalIF":2.8,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12668773/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145662876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SurprisalAnalysis: an open-source software for information-theoretic analysis of gene expression. 一个用于基因表达信息理论分析的开源软件。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-18 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf291
Annice Najafi

Summary: SurprisalAnalysis is an open-source R package with an accompanying web-based application that utilizes Surprisal analysis to extract patterns of genes that tend to get up or down regulated as a result of a biological process. Surprisal analysis frames gene expression values in thermodynamic terms and identifies entropy-driven constraints and relevant gene weights that allow the decomposition of each gene's expression into a baseline (maximal entropy) component and one or more constraint-driven components. These components correspond to distinct biological modules or processes whose coordinated up or down regulation underlies the observed system dynamics.

Availability and implementation: SurprisalAnalysis is written in R and is freely available on GitHub (https://github.com/AnniceNajafi/SurprisalAnalysis). The package is distributed under a permissive license to promote scientific collaboration and reproducibility. A web-based application with a Graphical User Interface (GUI) is hosted on https://najafiannice.shinyapps.io/surprisal_analysis_app/.

总结:SurprisalAnalysis是一个开源的R包,附带一个基于web的应用程序,它利用Surprisal分析来提取基因的模式,这些模式倾向于在生物过程中被上调或下调。惊喜分析框架基因表达值在热力学方面,并确定熵驱动的约束和相关的基因权重,允许分解每个基因的表达成一个基线(最大熵)组件和一个或多个约束驱动组件。这些成分对应于不同的生物模块或过程,其协调的上下调节是观察到的系统动力学的基础。可用性和实现:SurprisalAnalysis是用R编写的,可以在GitHub (https://github.com/AnniceNajafi/SurprisalAnalysis)上免费获得。该软件包是在一个宽松的许可下分发的,以促进科学合作和可重复性。具有图形用户界面(GUI)的基于web的应用程序托管在https://najafiannice.shinyapps.io/surprisal_analysis_app/上。
{"title":"SurprisalAnalysis: an open-source software for information-theoretic analysis of gene expression.","authors":"Annice Najafi","doi":"10.1093/bioadv/vbaf291","DOIUrl":"10.1093/bioadv/vbaf291","url":null,"abstract":"<p><strong>Summary: </strong>SurprisalAnalysis is an open-source R package with an accompanying web-based application that utilizes Surprisal analysis to extract patterns of genes that tend to get up or down regulated as a result of a biological process. Surprisal analysis frames gene expression values in thermodynamic terms and identifies entropy-driven constraints and relevant gene weights that allow the decomposition of each gene's expression into a baseline (maximal entropy) component and one or more constraint-driven components. These components correspond to distinct biological modules or processes whose coordinated up or down regulation underlies the observed system dynamics.</p><p><strong>Availability and implementation: </strong>SurprisalAnalysis is written in R and is freely available on GitHub (https://github.com/AnniceNajafi/SurprisalAnalysis). The package is distributed under a permissive license to promote scientific collaboration and reproducibility. A web-based application with a Graphical User Interface (GUI) is hosted on https://najafiannice.shinyapps.io/surprisal_analysis_app/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf291"},"PeriodicalIF":2.8,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12707977/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145776403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GDC Cohort Copilot: an AI copilot for curating cohorts from the genomic data commons. GDC Cohort Copilot:一种人工智能副驾驶,用于从基因组数据公地策划队列。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-18 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf295
Steven Song, Anirudh Subramanyam, Zhenyu Zhang, Aarti Venkat, Robert L Grossman

Motivation: The Genomic Data Commons (GDC) provides access to high quality, harmonized cancer genomics data through a unified curation and analysis platform centered around patient cohorts. While GDC users can interactively create complex cohorts through the graphical Cohort Builder, users (especially new ones) may struggle to find specific cohort descriptors across hundreds of possible fields and properties. However, users may be better able to describe their desired cohort in free-text natural language.

Results: We introduce GDC Cohort Copilot, an open-source copilot tool for curating cohorts from the GDC. GDC Cohort Copilot automatically generates the GDC cohort filter corresponding to a user-input natural language description of their desired cohort, before exporting the cohort back to the GDC for further analysis. An interactive user interface allows users to further refine the generated cohort. We develop and evaluate multiple large language models (LLMs) for GDC Cohort Copilot and demonstrate that our locally-served, open-source GDC Cohort LLM achieves better results than GPT-4o prompting in generating GDC cohorts.

Availability and implementation: We implement and share GDC Cohort Copilot as a containerized Gradio app on HuggingFace Spaces, available at https://huggingface.co/spaces/uc-ctds/GDC-Cohort-Copilot. GDC Cohort LLM weights are available at https://huggingface.co/uc-ctds. All source code is available at https://github.com/uc-cdis/gdc-cohort-copilot.

动机:基因组数据共享(GDC)通过以患者队列为中心的统一管理和分析平台,提供对高质量、协调的癌症基因组数据的访问。虽然GDC用户可以通过图形化的队列生成器交互式地创建复杂的队列,但用户(尤其是新用户)可能很难在数百个可能的字段和属性中找到特定的队列描述符。然而,用户可能更能够用自由文本的自然语言来描述他们想要的队列。结果:我们介绍了GDC Cohort Copilot,这是一个开源的辅助驾驶工具,用于从GDC中策划队列。GDC Cohort Copilot自动生成与用户输入的自然语言描述相对应的GDC队列过滤器,然后将队列导出到GDC进行进一步分析。交互式用户界面允许用户进一步细化生成的队列。我们为GDC Cohort Copilot开发并评估了多个大型语言模型(LLM),并证明了我们本地服务的开源GDC Cohort LLM在生成GDC队列方面取得了比gpt - 40提示更好的结果。可用性和实现:我们在HuggingFace Spaces上实现和共享GDC Cohort Copilot作为容器化的梯度应用程序,可在https://huggingface.co/spaces/uc-ctds/GDC-Cohort-Copilot上获得。GDC Cohort LLM权重可在https://huggingface.co/uc-ctds上获得。所有源代码可从https://github.com/uc-cdis/gdc-cohort-copilot获得。
{"title":"GDC Cohort Copilot: an AI copilot for curating cohorts from the genomic data commons.","authors":"Steven Song, Anirudh Subramanyam, Zhenyu Zhang, Aarti Venkat, Robert L Grossman","doi":"10.1093/bioadv/vbaf295","DOIUrl":"10.1093/bioadv/vbaf295","url":null,"abstract":"<p><strong>Motivation: </strong>The Genomic Data Commons (GDC) provides access to high quality, harmonized cancer genomics data through a unified curation and analysis platform centered around patient cohorts. While GDC users can interactively create complex cohorts through the graphical Cohort Builder, users (especially new ones) may struggle to find specific cohort descriptors across hundreds of possible fields and properties. However, users may be better able to describe their desired cohort in free-text natural language.</p><p><strong>Results: </strong>We introduce GDC Cohort Copilot, an open-source copilot tool for curating cohorts from the GDC. GDC Cohort Copilot automatically generates the GDC cohort filter corresponding to a user-input natural language description of their desired cohort, before exporting the cohort back to the GDC for further analysis. An interactive user interface allows users to further refine the generated cohort. We develop and evaluate multiple large language models (LLMs) for GDC Cohort Copilot and demonstrate that our locally-served, open-source GDC Cohort LLM achieves better results than GPT-4o prompting in generating GDC cohorts.</p><p><strong>Availability and implementation: </strong>We implement and share GDC Cohort Copilot as a containerized Gradio app on HuggingFace Spaces, available at https://huggingface.co/spaces/uc-ctds/GDC-Cohort-Copilot. GDC Cohort LLM weights are available at https://huggingface.co/uc-ctds. All source code is available at https://github.com/uc-cdis/gdc-cohort-copilot.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf295"},"PeriodicalIF":2.8,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12677940/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145703191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SeqManager: a web-based tool for efficient sequencing data storage management and duplicate detection. SeqManager:一个基于web的工具,用于高效的测序数据存储管理和重复检测。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-13 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf282
Margot Celerier, Andrew J Oldfield, William Ritchie

Motivation: Modern genomics laboratories generate massive volumes of sequencing data, often resulting in significant storage costs. Genomics storage consists of duplicate files, temporary processing files, and redundant intermediate data.

Results: We developed SeqManager, a web-based application that provides automated identification, classification, and management of sequencing data files with intelligent duplicate detection. It also detects intermediate sequencing files that can safely be removed. Evaluation across four genomics laboratory settings demonstrate that our tool is fast and has a very low memory footprint.

Availability and implementation: SeqManager is freely available under the MIT license at https://github.com/AIGeneRegulation/Sequencing-Data-Manager.

动机:现代基因组学实验室产生大量的测序数据,往往导致显著的存储成本。基因组存储由重复文件、临时处理文件和冗余中间数据组成。结果:我们开发了SeqManager,一个基于web的应用程序,提供自动识别、分类和管理序列数据文件,并具有智能重复检测。它还检测可以安全删除的中间排序文件。四个基因组学实验室环境的评估表明,我们的工具速度快,内存占用非常低。可用性和实现:SeqManager在MIT许可下可在https://github.com/AIGeneRegulation/Sequencing-Data-Manager免费获得。
{"title":"SeqManager: a web-based tool for efficient sequencing data storage management and duplicate detection.","authors":"Margot Celerier, Andrew J Oldfield, William Ritchie","doi":"10.1093/bioadv/vbaf282","DOIUrl":"10.1093/bioadv/vbaf282","url":null,"abstract":"<p><strong>Motivation: </strong>Modern genomics laboratories generate massive volumes of sequencing data, often resulting in significant storage costs. Genomics storage consists of duplicate files, temporary processing files, and redundant intermediate data.</p><p><strong>Results: </strong>We developed SeqManager, a web-based application that provides automated identification, classification, and management of sequencing data files with intelligent duplicate detection. It also detects intermediate sequencing files that can safely be removed. Evaluation across four genomics laboratory settings demonstrate that our tool is fast and has a very low memory footprint.</p><p><strong>Availability and implementation: </strong>SeqManager is freely available under the MIT license at https://github.com/AIGeneRegulation/Sequencing-Data-Manager.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf282"},"PeriodicalIF":2.8,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12701791/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145758440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unveiling novel drug-target couples: an empowered automated pipeline for enhanced virtual screening using AutoDock Vina. 揭示新的药物靶标夫妇:使用AutoDock Vina增强虚拟筛选的授权自动化管道。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-12 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf267
Sveva Bonomi, Stefano Carsi, Emily Samuela Turilli-Ghisolfi, Elisa Oltra, Tiziana Alberio, Mauro Fasano

Motivation: Drug repurposing offers a cost-effective and time-efficient strategy for identifying new therapeutic uses for existing medications, capitalizing on their known safety profiles and pharmacokinetics. We present an automated virtual screening pipeline using AutoDock Vina, a molecular docking software that predicts how small molecules bind to protein targets. This pipeline enhances the speed and accuracy of drug candidate identification by automating and parallelizing the docking process.

Results: We developed and validated a fully automated virtual screening pipeline based on AutoDock Vina, enabling computational parallelization and random ligand positioning without relying on prior knowledge of biologically active protein domains. As a proof of concept, the pipeline was applied to the "serotonin and anxiety" pathway. Docking results were compared with known drug-target interactions, demonstrating the ability of the pipeline to reliably identify compounds interacting with serotonin receptors. This case study confirms the pipeline's effectiveness in supporting drug repurposing by identifying promising candidates for further experimental validation.

Availability and implementation: The AutoDock Vina automation pipeline is freely available for noncommercial use at https://gitlab.com/la_sveva/pip2.0. It is compatible with Linux systems, and a Docker image is provided for ease of deployment and reproducibility. Researchers can easily integrate the pipeline into existing workflows, supporting broader adoption in virtual screening and drug repurposing projects.

动机:药物再利用为确定现有药物的新治疗用途提供了一种具有成本效益和时间效率的策略,利用其已知的安全性和药代动力学。我们提出了一个自动化的虚拟筛选管道,使用AutoDock Vina,一个分子对接软件,预测小分子如何与蛋白质目标结合。该管道通过对接过程的自动化和并行化,提高了候选药物识别的速度和准确性。结果:我们开发并验证了基于AutoDock Vina的全自动虚拟筛选管道,实现了计算并行化和随机配体定位,而无需依赖于生物活性蛋白结构域的先验知识。作为概念的证明,该管道被应用于“血清素和焦虑”途径。对接结果与已知的药物-靶标相互作用进行了比较,证明了该管道可靠地识别与血清素受体相互作用的化合物的能力。本案例研究通过确定有希望的候选药物进行进一步的实验验证,证实了该管道在支持药物再利用方面的有效性。可用性和实现:AutoDock Vina自动化管道可免费用于非商业用途,网址为https://gitlab.com/la_sveva/pip2.0。它与Linux系统兼容,并且提供了一个Docker映像以方便部署和再现性。研究人员可以很容易地将管道整合到现有的工作流程中,支持在虚拟筛选和药物再利用项目中更广泛的采用。
{"title":"Unveiling novel drug-target couples: an empowered automated pipeline for enhanced virtual screening using AutoDock Vina.","authors":"Sveva Bonomi, Stefano Carsi, Emily Samuela Turilli-Ghisolfi, Elisa Oltra, Tiziana Alberio, Mauro Fasano","doi":"10.1093/bioadv/vbaf267","DOIUrl":"10.1093/bioadv/vbaf267","url":null,"abstract":"<p><strong>Motivation: </strong>Drug repurposing offers a cost-effective and time-efficient strategy for identifying new therapeutic uses for existing medications, capitalizing on their known safety profiles and pharmacokinetics. We present an automated virtual screening pipeline using AutoDock Vina, a molecular docking software that predicts how small molecules bind to protein targets. This pipeline enhances the speed and accuracy of drug candidate identification by automating and parallelizing the docking process.</p><p><strong>Results: </strong>We developed and validated a fully automated virtual screening pipeline based on AutoDock Vina, enabling computational parallelization and random ligand positioning without relying on prior knowledge of biologically active protein domains. As a proof of concept, the pipeline was applied to the \"serotonin and anxiety\" pathway. Docking results were compared with known drug-target interactions, demonstrating the ability of the pipeline to reliably identify compounds interacting with serotonin receptors. This case study confirms the pipeline's effectiveness in supporting drug repurposing by identifying promising candidates for further experimental validation.</p><p><strong>Availability and implementation: </strong>The AutoDock Vina automation pipeline is freely available for noncommercial use at https://gitlab.com/la_sveva/pip2.0. It is compatible with Linux systems, and a Docker image is provided for ease of deployment and reproducibility. Researchers can easily integrate the pipeline into existing workflows, supporting broader adoption in virtual screening and drug repurposing projects.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf267"},"PeriodicalIF":2.8,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12699991/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145758515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FroM Superstring to Indexing: a space-efficient index for unconstrained k-mer sets using the Masked Burrows-Wheeler Transform (MBWT). 从超串到索引:使用掩码Burrows-Wheeler变换(MBWT)的无约束k-mer集的空间高效索引。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-12 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf290
Ondřej Sladký, Pavel Veselý, Karel Břinda

Motivation: The growing volumes and heterogeneity of genomic data call for scalable and versatile k-mer-set indexes. However, state-of-the-art indexes such as SBWT and SSHash depend on long non-branching paths in de Bruijn graphs, which limits their efficiency for small k, sampled data, or high-diversity settings.

Results: We introduce FMSI, a superstring-based index for arbitrary k-mer sets that supports efficient membership and compressed dictionary queries with strong theoretical guarantees. FMSI builds on recent advances in k-mer superstrings and uses the Masked Burrows-Wheeler Transform, a novel extension of the classical Burrows-Wheeler Transform that incorporates position masking. Across a range of k values and dataset types-including genomic, pangenomic, and metagenomic-FMSI consistently achieves superior query space efficiency, using up to 2-3× less memory than state-of-the-art methods, while maintaining competitive query times. Only a space-optimized version of SBWT can match the FMSI's footprint in some cases, but then FMSI is 2-3× faster. Our results establish superstring-based indexing as a robust, scalable, and versatile framework for arbitrary k-mer sets across diverse bioinformatics applications.

Availability and implementation: FMSI is developed in C++ and released under the MIT license, with source code provided at https://github.com/OndrejSladky/fmsi and an installable package available through Bioconda. The datasets used in the experiments are deposited at Zenodo (https://doi.org/10.5281/zenodo.14722244).

动机:不断增长的基因组数据量和异质性需要可扩展和通用的k-mer-set索引。然而,最先进的索引,如SBWT和SSHash依赖于de Bruijn图中的长非分支路径,这限制了它们对小k、采样数据或高多样性设置的效率。结果:我们引入了FMSI,这是一种基于超字符串的任意k-mer集索引,它支持有效的隶属关系和压缩字典查询,具有很强的理论保证。FMSI基于k-mer超弦的最新进展,并使用掩膜Burrows-Wheeler变换,这是经典Burrows-Wheeler变换的新扩展,包含位置掩蔽。在一系列k值和数据集类型(包括基因组、泛基因组和宏基因组)中,fmsi始终实现卓越的查询空间效率,使用的内存比最先进的方法少2-3倍,同时保持有竞争力的查询时间。在某些情况下,只有SBWT的空间优化版本才能匹配FMSI的占用空间,但FMSI的速度要快2-3倍。我们的研究结果建立了基于超字符串的索引作为一个鲁棒的、可扩展的、通用的框架,适用于不同生物信息学应用中的任意k-mer集。可用性和实现:FMSI是用c++开发的,并在MIT许可下发布,源代码提供于https://github.com/OndrejSladky/fmsi,可通过Bioconda获得安装包。实验中使用的数据集存放在Zenodo (https://doi.org/10.5281/zenodo.14722244)。
{"title":"FroM Superstring to Indexing: a space-efficient index for unconstrained <i>k</i>-mer sets using the Masked Burrows-Wheeler Transform (MBWT).","authors":"Ondřej Sladký, Pavel Veselý, Karel Břinda","doi":"10.1093/bioadv/vbaf290","DOIUrl":"10.1093/bioadv/vbaf290","url":null,"abstract":"<p><strong>Motivation: </strong>The growing volumes and heterogeneity of genomic data call for scalable and versatile <i>k</i>-mer-set indexes. However, state-of-the-art indexes such as SBWT and SSHash depend on long non-branching paths in de Bruijn graphs, which limits their efficiency for small <i>k</i>, sampled data, or high-diversity settings.</p><p><strong>Results: </strong>We introduce FMSI, a superstring-based index for arbitrary <i>k</i>-mer sets that supports efficient membership and compressed dictionary queries with strong theoretical guarantees. FMSI builds on recent advances in <i>k</i>-mer superstrings and uses the Masked Burrows-Wheeler Transform, a novel extension of the classical Burrows-Wheeler Transform that incorporates position masking. Across a range of <i>k</i> values and dataset types-including genomic, pangenomic, and metagenomic-FMSI consistently achieves superior query space efficiency, using up to 2-3× less memory than state-of-the-art methods, while maintaining competitive query times. Only a space-optimized version of SBWT can match the FMSI's footprint in some cases, but then FMSI is 2-3× faster. Our results establish superstring-based indexing as a robust, scalable, and versatile framework for arbitrary <i>k</i>-mer sets across diverse bioinformatics applications.</p><p><strong>Availability and implementation: </strong>FMSI is developed in C++ and released under the MIT license, with source code provided at https://github.com/OndrejSladky/fmsi and an installable package available through Bioconda. The datasets used in the experiments are deposited at Zenodo (https://doi.org/10.5281/zenodo.14722244).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf290"},"PeriodicalIF":2.8,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12800775/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145992099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring SARS-CoV-2 spike protein mutations through genetic algorithm-driven structural modeling. 通过遗传算法驱动的结构建模探索SARS-CoV-2刺突蛋白突变。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-11 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf288
Valentina Di Salvatore, Avisa Maleki, Babak Mohajer, Alvaro Ras-Carmona, Giulia Russo, Pedro Antonio Reche, Francesco Pappalardo

Motivation: The rapid evolution of SARS-CoV-2 highlights the importance of computational approaches to explore mutational effects on the viral spike protein. In this work, we present a genetic algorithm (GA) framework applied to the structural optimization of spike protein variants, with a focus on energetic and binding properties rather than direct evolutionary prediction.

Results: Our GA-driven pipeline generated spike variants with progressively improved structural stability as indicated by lower discrete optimized protein energy scores across generations. The approach also enabled evaluation of Gibbs free energy and binding affinity for spike-Angiotensin-converting enzyme 2 receptor interactions, revealing candidate conformations with favorable thermodynamic properties. These results demonstrate the algorithm's capacity to refine protein models and explore mutational landscapes in silico, although no validation against naturally emerging variants was performed. This study presents a methodological framework for GA-based structural modeling of SARS-CoV-2 spike mutations. Rather than forecasting specific variants of concern, it demonstrates the feasibility of a computational approach that can be extended and integrated with evolutionary and experimental evidence to strengthen future efforts in variant monitoring and vaccine development.

Availability and implementation: All the Python and R scripts are available upon request to the authors.

动机:SARS-CoV-2的快速进化凸显了利用计算方法探索病毒刺突蛋白突变效应的重要性。在这项工作中,我们提出了一种应用于刺突蛋白变异结构优化的遗传算法(GA)框架,重点关注能量和结合特性,而不是直接的进化预测。结果:我们的ga驱动的管道产生了结构稳定性逐步提高的spike变体,这表明在几代之间较低的离散优化蛋白质能量得分。该方法还可以评估吉布斯自由能和尖刺-血管紧张素转换酶2受体相互作用的结合亲和力,揭示具有良好热力学性质的候选构象。这些结果证明了该算法在改进蛋白质模型和探索计算机突变景观方面的能力,尽管没有对自然出现的变异进行验证。本研究提出了基于遗传算法的SARS-CoV-2刺突突变结构建模的方法学框架。它不是预测引起关注的具体变异,而是证明了一种计算方法的可行性,这种方法可以扩展并与进化和实验证据相结合,以加强变异监测和疫苗开发方面的未来努力。可用性和实现:所有的Python和R脚本都可以根据作者的要求提供。
{"title":"Exploring SARS-CoV-2 spike protein mutations through genetic algorithm-driven structural modeling.","authors":"Valentina Di Salvatore, Avisa Maleki, Babak Mohajer, Alvaro Ras-Carmona, Giulia Russo, Pedro Antonio Reche, Francesco Pappalardo","doi":"10.1093/bioadv/vbaf288","DOIUrl":"10.1093/bioadv/vbaf288","url":null,"abstract":"<p><strong>Motivation: </strong>The rapid evolution of SARS-CoV-2 highlights the importance of computational approaches to explore mutational effects on the viral spike protein. In this work, we present a genetic algorithm (GA) framework applied to the structural optimization of spike protein variants, with a focus on energetic and binding properties rather than direct evolutionary prediction.</p><p><strong>Results: </strong>Our GA-driven pipeline generated spike variants with progressively improved structural stability as indicated by lower discrete optimized protein energy scores across generations. The approach also enabled evaluation of Gibbs free energy and binding affinity for spike-Angiotensin-converting enzyme 2 receptor interactions, revealing candidate conformations with favorable thermodynamic properties. These results demonstrate the algorithm's capacity to refine protein models and explore mutational landscapes in silico, although no validation against naturally emerging variants was performed. This study presents a methodological framework for GA-based structural modeling of SARS-CoV-2 spike mutations. Rather than forecasting specific variants of concern, it demonstrates the feasibility of a computational approach that can be extended and integrated with evolutionary and experimental evidence to strengthen future efforts in variant monitoring and vaccine development.</p><p><strong>Availability and implementation: </strong>All the Python and R scripts are available upon request to the authors.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf288"},"PeriodicalIF":2.8,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12627402/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145566521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KSMoFinder-knowledge graph embedding of proteins and motifs for predicting kinases of human phosphosites. ksmofinder知识图谱嵌入蛋白和基序预测人类磷酸基激酶。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-11 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf289
Manju Anandakrishnan, Karen E Ross, Chuming Chen, K Vijay-Shanker, Cathy H Wu

Motivation: Protein kinases regulate cellular signaling pathways through a cascade of phosphorylation activity, selectively targeting specific residues on substrate proteins (phosphosites). Determining the characteristics of kinases that phosphorylate specific substrates have been extensively studied. Most tools utilize amino acid sequence motifs around phosphosites but don't consider substrate protein's biological characteristics.

Results: We present KSMoFinder, a kinase-substrate-motif prediction model that learns factors beyond motif similarities by integrating proteins' biological contexts. We learn the semantics in a knowledge graph containing proteins' contextual relationships, kinase-specific motifs and motif composition, and represent the proteins and motifs as vectors. Using the representations as features, we train a supervised deep-learning classifier to identify kinase-phosphosite relationships. We use ground truth kinase-substrate-motif dataset from iPTMnet and PhosphositePlus and evaluate KSMoFinder's prediction performance. Pairwise comparative assessments with prior kinase-substrate prediction tools demonstrate KSMoFinder's superior performance. KSMoFinder trained using our knowledge graph embeddings surpasses the prediction performances using embeddings of popular protein language models such as ProtT5, ESM2, and ESM3 with a ROC-AUC of 0.851 and PR-AUC of 0.839 on a testing dataset with equal number of positives and negatives. Unlike most existing tools, KSMoFinder can be utilized to predict at the motif and at the substrate protein level.

Availability and implementation: Source code is available at https://github.com/manju-anandakrishnan/KSMoFinder.

动机:蛋白激酶通过磷酸化活性级联调节细胞信号通路,选择性地靶向底物蛋白(磷酸基)上的特定残基。确定磷酸化特定底物的激酶的特性已被广泛研究。大多数工具利用磷酸基周围的氨基酸序列基序,但没有考虑底物蛋白的生物学特性。结果:我们提出了KSMoFinder,这是一个激酶-底物-基序预测模型,通过整合蛋白质的生物学背景来学习基序相似性以外的因素。我们在包含蛋白质上下文关系、激酶特异性基序和基序组成的知识图中学习语义,并将蛋白质和基序表示为向量。使用表征作为特征,我们训练了一个有监督的深度学习分类器来识别激酶-磷酸基关系。我们使用来自iPTMnet和PhosphositePlus的真实激酶-底物-基序数据集,并评估KSMoFinder的预测性能。与先前激酶-底物预测工具的两两比较评估表明KSMoFinder具有优越的性能。使用我们的知识图嵌入训练的KSMoFinder在阳性和阴性数量相同的测试数据集上的ROC-AUC为0.851,PR-AUC为0.839,超过了使用ProtT5, ESM2和ESM3等流行蛋白质语言模型嵌入的预测性能。与大多数现有工具不同,KSMoFinder可以在基序和底物蛋白水平上进行预测。可用性和实现:源代码可从https://github.com/manju-anandakrishnan/KSMoFinder获得。
{"title":"KSMoFinder-knowledge graph embedding of proteins and motifs for predicting kinases of human phosphosites.","authors":"Manju Anandakrishnan, Karen E Ross, Chuming Chen, K Vijay-Shanker, Cathy H Wu","doi":"10.1093/bioadv/vbaf289","DOIUrl":"10.1093/bioadv/vbaf289","url":null,"abstract":"<p><strong>Motivation: </strong>Protein kinases regulate cellular signaling pathways through a cascade of phosphorylation activity, selectively targeting specific residues on substrate proteins (phosphosites). Determining the characteristics of kinases that phosphorylate specific substrates have been extensively studied. Most tools utilize amino acid sequence motifs around phosphosites but don't consider substrate protein's biological characteristics.</p><p><strong>Results: </strong>We present KSMoFinder, a kinase-substrate-motif prediction model that learns factors beyond motif similarities by integrating proteins' biological contexts. We learn the semantics in a knowledge graph containing proteins' contextual relationships, kinase-specific motifs and motif composition, and represent the proteins and motifs as vectors. Using the representations as features, we train a supervised deep-learning classifier to identify kinase-phosphosite relationships. We use ground truth kinase-substrate-motif dataset from iPTMnet and PhosphositePlus and evaluate KSMoFinder's prediction performance. Pairwise comparative assessments with prior kinase-substrate prediction tools demonstrate KSMoFinder's superior performance. KSMoFinder trained using our knowledge graph embeddings surpasses the prediction performances using embeddings of popular protein language models such as ProtT5, ESM2, and ESM3 with a ROC-AUC of 0.851 and PR-AUC of 0.839 on a testing dataset with equal number of positives and negatives. Unlike most existing tools, KSMoFinder can be utilized to predict at the motif and at the substrate protein level.</p><p><strong>Availability and implementation: </strong>Source code is available at https://github.com/manju-anandakrishnan/KSMoFinder.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf289"},"PeriodicalIF":2.8,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12664573/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145650205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics advances
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1