首页 > 最新文献

Bioinformatics advances最新文献

英文 中文
Integrating differential privacy into federated multi-task learning algorithms in dsMTL. 将差分隐私集成到dsMTL的联邦多任务学习算法中。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-23 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf298
Roman Schefzik, Han Cao, Sivanesan Rajan, Xavier Escribà-Montagut, Juan R González, Emanuel Schwarz

Motivation: Multi-task learning (MTL) enables simultaneous learning of related regression or classification tasks by exploiting shared information. The R package dsMTL provides a computational framework for federated MTL approaches, supporting the analysis of sensitive, individual-level data from geographically distributed data sources using the DataSHIELD platform. While the current architecture provides comprehensive data security mechanisms, these are not specifically tailored to MTL models. In particular, these models may still be vulnerable to membership inference attacks, attempting to determine whether a specific individual was included in a given training set using the model.

Results: To further enhance the privacy-preserving capabilities of dsMTL and protect against such attacks, differential privacy using the Laplace mechanism is integrated into dsMTL as a novel optional feature. This approach aims to obscure individual-level characteristics from the model while retaining group-level differences. The differential privacy implementation is validated in both simulation studies and a case study identifying schizophrenia patients from gene expression data. For practical utility, it is crucial to find an adequate balance between the degree of privacy protection and the conservation of model performance by choosing a reasonable privacy parameter within the differential privacy mechanism.

Availability and implementation: dsMTL is open-source and available at https://github.com/transbioZI/dsMTLBase (server-side) and https://github.com/transbioZI/dsMTLClient (client-side).

动机:多任务学习(Multi-task learning, MTL)通过利用共享信息实现相关回归或分类任务的同时学习。R包dsMTL为联邦MTL方法提供了一个计算框架,支持使用DataSHIELD平台分析来自地理分布数据源的敏感的、个人级别的数据。虽然当前的体系结构提供了全面的数据安全机制,但这些机制并不是专门为MTL模型量身定制的。特别是,这些模型可能仍然容易受到成员推理攻击,试图使用模型确定特定个体是否包含在给定的训练集中。结果:为了进一步增强dsMTL的隐私保护能力并防范此类攻击,将使用拉普拉斯机制的差分隐私作为一种新的可选特性集成到dsMTL中。这种方法旨在从模型中模糊个人层面的特征,同时保留群体层面的差异。在模拟研究和从基因表达数据中识别精神分裂症患者的案例研究中,差异隐私实现得到了验证。在差分隐私机制中选择合理的隐私参数,在隐私保护程度和模型性能守恒之间找到适当的平衡,对于实际应用至关重要。可用性和实现:dsMTL是开源的,可以在https://github.com/transbioZI/dsMTLBase(服务器端)和https://github.com/transbioZI/dsMTLClient(客户端)上获得。
{"title":"Integrating differential privacy into federated multi-task learning algorithms in <b>dsMTL</b>.","authors":"Roman Schefzik, Han Cao, Sivanesan Rajan, Xavier Escribà-Montagut, Juan R González, Emanuel Schwarz","doi":"10.1093/bioadv/vbaf298","DOIUrl":"10.1093/bioadv/vbaf298","url":null,"abstract":"<p><strong>Motivation: </strong>Multi-task learning (MTL) enables simultaneous learning of related regression or classification tasks by exploiting shared information. The R package dsMTL provides a computational framework for federated MTL approaches, supporting the analysis of sensitive, individual-level data from geographically distributed data sources using the DataSHIELD platform. While the current architecture provides comprehensive data security mechanisms, these are not specifically tailored to MTL models. In particular, these models may still be vulnerable to membership inference attacks, attempting to determine whether a specific individual was included in a given training set using the model.</p><p><strong>Results: </strong>To further enhance the privacy-preserving capabilities of dsMTL and protect against such attacks, differential privacy using the Laplace mechanism is integrated into dsMTL as a novel optional feature. This approach aims to obscure individual-level characteristics from the model while retaining group-level differences. The differential privacy implementation is validated in both simulation studies and a case study identifying schizophrenia patients from gene expression data. For practical utility, it is crucial to find an adequate balance between the degree of privacy protection and the conservation of model performance by choosing a reasonable privacy parameter within the differential privacy mechanism.</p><p><strong>Availability and implementation: </strong>dsMTL is open-source and available at https://github.com/transbioZI/dsMTLBase (server-side) and https://github.com/transbioZI/dsMTLClient (client-side).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf298"},"PeriodicalIF":2.8,"publicationDate":"2025-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12701803/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145758425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
geomeTriD: a Bioconductor package for interactive and integrative visualization of 3D structural model with multi-omics data. 一个生物导体包,用于交互式和集成可视化的三维结构模型与多组学数据。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-23 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf299
Jianhong Ou, Kenneth D Poss

Motivation: The three-dimensional organization of the genome plays a critical role in regulating gene expression by shaping the spatial and temporal interactions between regulatory elements. High-throughput chromosome conformation capture (Hi-C) technologies, along with immunoprecipitation- or chromatin accessibility-based chromatin architecture mapping methods, enable the measurement of chromatin dynamics at both bulk and single-cell levels. However, effectively exploring and comparing chromatin structures remains challenging, particularly when integrating multiple layers of genomic annotation or comparing structural dynamics across conditions. While several tools support interactive 3D genome visualization, few provide a flexible, R-integrated framework that supports custom annotations, side-by-side comparison of multiple stages or conditions, and deployment in Shiny applications.

Results: To address this need, we have developed geomeTriD, an R/Bioconductor package that enables interactive visualization of chromatin structures using three.js, supports multi-layer annotation, allows parallel comparison of two chromatin states, and is compatible with Shiny-based analysis workflows. As multi-omic and spatial genomic datasets grow in complexity, GeomeTriD will facilitate the reconstruction and comparison of 3D genome structures across conditions, linking chromatin architecture to gene regulation, epigenetic states, and cell-state transitions.

Availability and implementation: geomeTriD is freely available at https://bioconductor.org/packages/geomeTriD.

动机:基因组的三维组织通过塑造调控元件之间的时空相互作用,在调控基因表达中起着至关重要的作用。高通量染色体构象捕获(Hi-C)技术,以及基于免疫沉淀或染色质可及性的染色质结构制图方法,能够在整体和单细胞水平上测量染色质动力学。然而,有效地探索和比较染色质结构仍然具有挑战性,特别是在整合多层基因组注释或比较不同条件下的结构动态时。虽然有一些工具支持交互式3D基因组可视化,但很少有工具提供灵活的r集成框架,支持自定义注释、多个阶段或条件的并行比较,以及在Shiny应用程序中部署。结果:为了满足这一需求,我们开发了一个R/Bioconductor包,它可以使用three.js实现染色质结构的交互式可视化,支持多层注释,允许两种染色质状态的并行比较,并且与基于shine的分析工作流程兼容。随着多组学和空间基因组数据集的日益复杂,geomeid将促进不同条件下三维基因组结构的重建和比较,将染色质结构与基因调控、表观遗传状态和细胞状态转换联系起来。可用性和实现:在https://bioconductor.org/packages/geomeTriD上可以免费获得geomeTriD。
{"title":"geomeTriD: a Bioconductor package for interactive and integrative visualization of 3D structural model with multi-omics data.","authors":"Jianhong Ou, Kenneth D Poss","doi":"10.1093/bioadv/vbaf299","DOIUrl":"10.1093/bioadv/vbaf299","url":null,"abstract":"<p><strong>Motivation: </strong>The three-dimensional organization of the genome plays a critical role in regulating gene expression by shaping the spatial and temporal interactions between regulatory elements. High-throughput chromosome conformation capture (Hi-C) technologies, along with immunoprecipitation- or chromatin accessibility-based chromatin architecture mapping methods, enable the measurement of chromatin dynamics at both bulk and single-cell levels. However, effectively exploring and comparing chromatin structures remains challenging, particularly when integrating multiple layers of genomic annotation or comparing structural dynamics across conditions. While several tools support interactive 3D genome visualization, few provide a flexible, R-integrated framework that supports custom annotations, side-by-side comparison of multiple stages or conditions, and deployment in Shiny applications.</p><p><strong>Results: </strong>To address this need, we have developed geomeTriD, an R/Bioconductor package that enables interactive visualization of chromatin structures using three.js, supports multi-layer annotation, allows parallel comparison of two chromatin states, and is compatible with Shiny-based analysis workflows. As multi-omic and spatial genomic datasets grow in complexity, GeomeTriD will facilitate the reconstruction and comparison of 3D genome structures across conditions, linking chromatin architecture to gene regulation, epigenetic states, and cell-state transitions.</p><p><strong>Availability and implementation: </strong>geomeTriD is freely available at https://bioconductor.org/packages/geomeTriD.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf299"},"PeriodicalIF":2.8,"publicationDate":"2025-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12702139/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145764171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance assessment of phylogenetic inference tools using PhyloSmew. 基于PhyloSmew的系统发育推断工具的性能评估。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-23 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf300
Dimitri Höhler, Julia Haag, Alexey M Kozlov, Benoit Morel, Alexandros Stamatakis

Motivation: The performance of phylogenetic inference tools is commonly evaluated using simulated as well as empirical sequence data alignments. An open question is how representative these alignments are with respect to those, commonly analyzed by users. Using the RAxMLGrove database, it is now possible to simulate DNA and amino acid sequences based on more than 70 000 representative RAxML and RAxML-NG tree inferences on empirical datasets conducted on the RAxML web servers. This allows to assess the phylogenetic tree inference accuracy of various inference tools based on more realistic and representative simulated alignments.

Results: To automate this process, we implement PhyloSmew, a tool for benchmarking phylogenetic inference tools. We use it to simulate ∼20 000 multiple sequence alignments (MSAs) based on representative empirical trees (in terms of signal strength) from RAxMLGrove. We subsequently analyze 5000 empirical MSAs from the TreeBASE database, to assess the inference accuracy of FastTree2, IQ-TREE2, and RAxML-NG. We find that on quantifiably difficult-to-analyze MSAs, all three tree inference tools perform poorly. Hence, the faster FastTree2 tool, constitutes a viable alternative to infer trees on difficult MSAs. We also find that there are substantial differences between accuracy results on simulated versus empirical data.

Availability and implementation: The data underlying this article are available at https://github.com/angtft/PhyloSmew, https://cme.h-its.org/exelixis/material/accuracy-study/data.tar.gz.

动机:系统发育推断工具的性能通常使用模拟和经验序列数据比对来评估。一个悬而未决的问题是,相对于那些通常由用户分析的排列,这些排列的代表性如何。使用RAxMLGrove数据库,现在可以根据在RAxML web服务器上进行的经验数据集上的超过70,000个代表性RAxML和RAxML- ng树推断来模拟DNA和氨基酸序列。这允许评估基于更现实和代表性的模拟比对的各种推理工具的系统发育树推理精度。结果:为了使这一过程自动化,我们实现了PhyloSmew,这是一个对系统发育推断工具进行基准测试的工具。我们使用它来模拟基于来自RAxMLGrove的代表性经验树(就信号强度而言)的~ 20,000多个序列比对(msa)。随后,我们分析了来自TreeBASE数据库的5000个经验msa,以评估fasttre2、IQ-TREE2和RAxML-NG的推理精度。我们发现,在难以量化分析的msa上,所有三种树推理工具都表现不佳。因此,更快的fasttre2工具构成了在困难的msa上推断树的可行替代方案。我们还发现,在模拟数据和经验数据的精度结果之间存在实质性差异。可用性和实现:本文的基础数据可从https://github.com/angtft/PhyloSmew和https://cme.h-its.org/exelixis/material/accuracy-study/data.tar.gz获得。
{"title":"Performance assessment of phylogenetic inference tools using PhyloSmew.","authors":"Dimitri Höhler, Julia Haag, Alexey M Kozlov, Benoit Morel, Alexandros Stamatakis","doi":"10.1093/bioadv/vbaf300","DOIUrl":"10.1093/bioadv/vbaf300","url":null,"abstract":"<p><strong>Motivation: </strong>The performance of phylogenetic inference tools is commonly evaluated using simulated as well as empirical sequence data alignments. An open question is how representative these alignments are with respect to those, commonly analyzed by users. Using the RAxMLGrove database, it is now possible to simulate DNA and amino acid sequences based on more than 70 000 representative RAxML and RAxML-NG tree inferences on empirical datasets conducted on the RAxML web servers. This allows to assess the phylogenetic tree inference accuracy of various inference tools based on more realistic and representative simulated alignments.</p><p><strong>Results: </strong>To automate this process, we implement PhyloSmew, a tool for benchmarking phylogenetic inference tools. We use it to simulate ∼20 000 multiple sequence alignments (MSAs) based on representative empirical trees (in terms of signal strength) from RAxMLGrove. We subsequently analyze 5000 empirical MSAs from the TreeBASE database, to assess the inference accuracy of FastTree2, IQ-TREE2, and RAxML-NG. We find that on quantifiably difficult-to-analyze MSAs, all three tree inference tools perform poorly. Hence, the faster FastTree2 tool, constitutes a viable alternative to infer trees on difficult MSAs. We also find that there are substantial differences between accuracy results on simulated versus empirical data.</p><p><strong>Availability and implementation: </strong>The data underlying this article are available at https://github.com/angtft/PhyloSmew, https://cme.h-its.org/exelixis/material/accuracy-study/data.tar.gz.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf300"},"PeriodicalIF":2.8,"publicationDate":"2025-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12701799/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145758434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SNPraefentia: a toolkit to prioritize microbial genome variants linked to health and disease. SNPraefentia:一个优先考虑与健康和疾病相关的微生物基因组变异的工具包。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-22 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf297
Nadeem Khan, Muhammad Muneeb Nasir, Ammar Mushtaq, Masood Ur Rehman Kayani

Motivation: Analysis of genomic variation in microbial genomes is crucial for understanding how microbes adapt, interact with their hosts, and influence health and disease. In metagenomic studies, where genetic material from entire microbial communities is sequenced, thousands of single-nucleotide polymorphisms can be detected across species and samples. However, identifying which of these variations has biologically or functionally relevant impacts remains a significant challenge.

Results: To address this, we present SNPraefentia, a Python-based toolkit for prioritizing microbial SNPs based on their predicted functional relevance. The tool integrates multiple biologically meaningful parameters, including sequencing depth, physicochemical impact of amino acid substitutions, and the structural and functional context of mutations within annotated protein domains. SNPraefentia extracts variation depth and amino acid changes, annotates protein domains using UniProt, and computes individual impact scores. These are then integrated into a composite prioritization score that reflects the potential biological importance of each variant. Overall, SNPraefentia provides researchers with a systematic and reproducible approach to filter and rank microbial variants for downstream functional analysis or experimental validation.

Availability and implementation: The toolkit and test data are freely available at https://github.com/muneebdev7/SNPraefentia.

动机:分析微生物基因组中的基因组变异对于理解微生物如何适应、与宿主相互作用以及影响健康和疾病至关重要。在宏基因组研究中,对整个微生物群落的遗传物质进行测序,可以在物种和样本中检测到数千个单核苷酸多态性。然而,确定哪些变异具有生物学或功能上的相关影响仍然是一项重大挑战。结果:为了解决这个问题,我们提出了SNPraefentia,这是一个基于python的工具包,用于根据预测的功能相关性对微生物snp进行优先排序。该工具集成了多个具有生物学意义的参数,包括测序深度,氨基酸取代的物理化学影响,以及注释蛋白区域内突变的结构和功能背景。SNPraefentia提取变异深度和氨基酸变化,使用UniProt注释蛋白质结构域,并计算个体影响分数。然后将这些信息整合到一个综合的优先级评分中,该评分反映了每个变异的潜在生物学重要性。总的来说,SNPraefentia为研究人员提供了系统的、可重复的方法来筛选和排序微生物变异,用于下游功能分析或实验验证。可用性和实现:工具箱和测试数据可以在https://github.com/muneebdev7/SNPraefentia上免费获得。
{"title":"SNPraefentia: a toolkit to prioritize microbial genome variants linked to health and disease.","authors":"Nadeem Khan, Muhammad Muneeb Nasir, Ammar Mushtaq, Masood Ur Rehman Kayani","doi":"10.1093/bioadv/vbaf297","DOIUrl":"10.1093/bioadv/vbaf297","url":null,"abstract":"<p><strong>Motivation: </strong>Analysis of genomic variation in microbial genomes is crucial for understanding how microbes adapt, interact with their hosts, and influence health and disease. In metagenomic studies, where genetic material from entire microbial communities is sequenced, thousands of single-nucleotide polymorphisms can be detected across species and samples. However, identifying which of these variations has biologically or functionally relevant impacts remains a significant challenge.</p><p><strong>Results: </strong>To address this, we present SNPraefentia, a Python-based toolkit for prioritizing microbial SNPs based on their predicted functional relevance. The tool integrates multiple biologically meaningful parameters, including sequencing depth, physicochemical impact of amino acid substitutions, and the structural and functional context of mutations within annotated protein domains. SNPraefentia extracts variation depth and amino acid changes, annotates protein domains using UniProt, and computes individual impact scores. These are then integrated into a composite prioritization score that reflects the potential biological importance of each variant. Overall, SNPraefentia provides researchers with a systematic and reproducible approach to filter and rank microbial variants for downstream functional analysis or experimental validation.</p><p><strong>Availability and implementation: </strong>The toolkit and test data are freely available at https://github.com/muneebdev7/SNPraefentia.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf297"},"PeriodicalIF":2.8,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12671963/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145672764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PseudoChecker2 and PseudoViz: automation and visualization of gene loss in the Genome Era. PseudoChecker2和PseudoViz:基因组时代基因丢失的自动化和可视化。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-22 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf202
Rui Resende-Pinto, Raquel Ruivo, Josefin Stiller, Rute Fonseca, Luís Filipe C Castro

Summary: High-fidelity genome assemblies provide unprecedented opportunities to decipher mechanisms of molecular evolution and phenotype landscapes. Here, we present PseudoChecker2, a command-line version of the web-tool PseudoChecker with expanded functions. It identifies gene loss via drastic mutational events such as premature stop codons, deletions and insertions. It enables the investigation of cross-species genomic datasets through: (i) integration into automated workflows, (ii) multiprocessing capability, and (iii) creation of a functional reference from annotation files. In addition, we introduce PseudoViz, a novel graphical interface designed to help interpret the results of PseudoChecker2 with intuitive visualizations. These tools combine the versatility and automation of a command-line tool with the user-friendliness of a graphical interface to tackle the challenges of the Genome Era.

Availability and implementation: PseudoChecker2 and PseudoViz are fully available at https://github.com/rresendepinto/PseudoChecker2and  https://github.com/rresendepinto/PseudoViz.

摘要:高保真基因组组装为破译分子进化机制和表型景观提供了前所未有的机会。在这里,我们介绍PseudoChecker2,这是web工具PseudoChecker的命令行版本,具有扩展的功能。它通过剧烈的突变事件(如过早终止密码子、缺失和插入)识别基因丢失。它可以通过以下方式对跨物种基因组数据集进行调查:(i)集成到自动化工作流程中,(ii)多处理能力,(iii)从注释文件创建功能参考。此外,我们还介绍了PseudoViz,这是一个新颖的图形界面,旨在通过直观的可视化来帮助解释PseudoChecker2的结果。这些工具将命令行工具的多功能性和自动化与图形界面的用户友好性相结合,以应对基因组时代的挑战。可用性和实现:PseudoChecker2和PseudoViz可以在https://github.com/rresendepinto/PseudoChecker2and https://github.com/rresendepinto/PseudoViz上完全获得。
{"title":"PseudoChecker2 and PseudoViz: automation and visualization of gene loss in the Genome Era.","authors":"Rui Resende-Pinto, Raquel Ruivo, Josefin Stiller, Rute Fonseca, Luís Filipe C Castro","doi":"10.1093/bioadv/vbaf202","DOIUrl":"10.1093/bioadv/vbaf202","url":null,"abstract":"<p><strong>Summary: </strong>High-fidelity genome assemblies provide unprecedented opportunities to decipher mechanisms of molecular evolution and phenotype landscapes. Here, we present PseudoChecker2, a command-line version of the web-tool PseudoChecker with expanded functions. It identifies gene loss via drastic mutational events such as premature stop codons, deletions and insertions. It enables the investigation of cross-species genomic datasets through: (i) integration into automated workflows, (ii) multiprocessing capability, and (iii) creation of a functional reference from annotation files. In addition, we introduce PseudoViz, a novel graphical interface designed to help interpret the results of PseudoChecker2 with intuitive visualizations. These tools combine the versatility and automation of a command-line tool with the user-friendliness of a graphical interface to tackle the challenges of the Genome Era.</p><p><strong>Availability and implementation: </strong>PseudoChecker2 and PseudoViz are fully available at https://github.com/rresendepinto/PseudoChecker2and  https://github.com/rresendepinto/PseudoViz.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf202"},"PeriodicalIF":2.8,"publicationDate":"2025-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12679834/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145703175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Snappy: fast identification of DNA methylation motifs based on oxford nanopore reads. 基于牛津纳米孔读取的DNA甲基化基序的快速鉴定。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-21 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf296
Dmitry N Konanov, Danil V Krivonos, Vladislav V Babenko, Elena N Ilina

Motivation: Nowadays, DNA methylation in bacteria is studied mainly using single-molecule sequencing technologies like PacBio and Oxford Nanopore. In nanopore sequencing, calling of methylated positions is provided by special models implemented directly in basecallers. Prokaryotic DNA methyltransferases are site-specific enzymes, which catalyze methylation in specific methylation motifs. Inference of these motifs is usually performed using third party software like MEME providing classical motif enrichment based only on sequence data. However, currently used motif enrichment algorithms rely only on sequence data, and do not use additional base modification information provided by the basecaller.

Results: Herein, we present a new tool Snappy, which is actually rethinking of the original Snapper algorithm but does not use any enrichment heuristics and does not require control sample sequencing. Snappy combines basecalling data processing with a new graph-based enrichment algorithm, thus significantly enhancing the enrichment sensitivity and accuracy. The versatility of the method was shown on both our and external data, representing different bacterial species with complex and simple methylome.

Availability and implementation: Source code and documentation is hosted on GitHub (https://github.com/DNKonanov/ont-snappy) and Zenodo (zenodo.org/records/16731817). For accessibility, Snappy is installable from PyPi using "pip install ont-snappy" command.

动机:目前,研究细菌DNA甲基化主要使用单分子测序技术,如PacBio和Oxford Nanopore。在纳米孔测序中,甲基化位置的调用是由直接在碱基调用器中实现的特殊模型提供的。原核DNA甲基转移酶是位点特异性的酶,在特定的甲基化基序中催化甲基化。这些基序的推断通常使用第三方软件,如MEME,仅基于序列数据提供经典基序丰富。然而,目前使用的基序丰富算法仅依赖于序列数据,而不使用基调用者提供的额外的碱基修改信息。结果:本文提出了一种新的工具Snappy,它实际上是对原始Snapper算法的重新思考,但不使用任何富集启发式算法,也不需要对照样本测序。Snappy将基调用数据处理与新的基于图的富集算法相结合,从而显著提高了富集的灵敏度和准确性。该方法的通用性在我们和外部数据上都得到了证明,代表了不同的细菌物种具有复杂和简单的甲基组。可用性和实现:源代码和文档托管在GitHub (https://github.com/DNKonanov/ont-snappy)和Zenodo (zenodo.org/records/16731817)上。为了便于访问,可以使用“pip install - Snappy ”命令从PyPi安装Snappy。
{"title":"Snappy: fast identification of DNA methylation motifs based on oxford nanopore reads.","authors":"Dmitry N Konanov, Danil V Krivonos, Vladislav V Babenko, Elena N Ilina","doi":"10.1093/bioadv/vbaf296","DOIUrl":"10.1093/bioadv/vbaf296","url":null,"abstract":"<p><strong>Motivation: </strong>Nowadays, DNA methylation in bacteria is studied mainly using single-molecule sequencing technologies like PacBio and Oxford Nanopore. In nanopore sequencing, calling of methylated positions is provided by special models implemented directly in basecallers. Prokaryotic DNA methyltransferases are site-specific enzymes, which catalyze methylation in specific methylation motifs. Inference of these motifs is usually performed using third party software like MEME providing classical motif enrichment based only on sequence data. However, currently used motif enrichment algorithms rely only on sequence data, and do not use additional base modification information provided by the basecaller.</p><p><strong>Results: </strong>Herein, we present a new tool Snappy, which is actually rethinking of the original Snapper algorithm but does not use any enrichment heuristics and does not require control sample sequencing. Snappy combines basecalling data processing with a new graph-based enrichment algorithm, thus significantly enhancing the enrichment sensitivity and accuracy. The versatility of the method was shown on both our and external data, representing different bacterial species with complex and simple methylome.</p><p><strong>Availability and implementation: </strong>Source code and documentation is hosted on GitHub (https://github.com/DNKonanov/ont-snappy) and Zenodo (zenodo.org/records/16731817). For accessibility, Snappy is installable from PyPi using \"pip install ont-snappy\" command.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf296"},"PeriodicalIF":2.8,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12679398/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145703204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PSQAN: a pipeline to prioritize novel and biologically relevant transcripts from long-read RNA sequencing. PSQAN:从长读RNA测序中优先考虑新颖和生物学相关转录物的管道。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-20 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf293
Siddharth Sethi, Emil K Gustavsson, Harpreet Saini, Mina Ryten

Motivation: Long-read RNA sequencing has the potential to accurately quantify transcriptomes and reveal the isoform diversity of disease-causing genes. However, despite the recent advances in analysis tools for transcript discovery, long-read RNA sequencing data is still challenging to analyse, due to the detection of hundreds or even thousands of novel transcripts per gene.

Results: Here, we introduce PSQAN, a workflow to help researchers prioritize high-confidence and potentially biologically relevant transcripts associated with candidate genes and make transcript characterization results more interpretable. PSQAN performs a gene-based analysis on characterized transcripts generated by SQANTI3 and TALON. PSQAN re-groups transcripts into easily interpretable categories to facilitate their prioritization, allows transcript-level expression thresholds, and generates visualizations to determine optimal expression thresholds. Overall, we demonstrate that PSQAN is a useful tool which enables users to identify known and novel transcripts of potential biological importance.

Availability and implementation: PSQAN is an analysis workflow implemented in Snakemake and R and is licensed under the GNU General Public License version 3. The source code and documentation of this tool is available at https://github.com/sid-sethi/PSQAN.

动机:长读RNA测序具有准确量化转录组和揭示致病基因同种异构体多样性的潜力。然而,尽管转录物发现的分析工具最近取得了进展,但由于每个基因检测到数百甚至数千个新的转录物,长读RNA测序数据仍然具有挑战性。结果:在这里,我们介绍了PSQAN,这是一个工作流程,可以帮助研究人员优先考虑与候选基因相关的高可信度和潜在生物学相关的转录本,并使转录本表征结果更具可解释性。PSQAN对SQANTI3和TALON生成的特征转录本进行基于基因的分析。PSQAN将转录本重新分组为易于解释的类别,以促进其优先级排序,允许转录水平表达阈值,并生成可视化以确定最佳表达阈值。总之,我们证明PSQAN是一个有用的工具,它使用户能够识别已知的和新的具有潜在生物学重要性的转录本。可用性和实现:PSQAN是一个在Snakemake和R中实现的分析工作流,并在GNU通用公共许可证版本3下获得许可。该工具的源代码和文档可从https://github.com/sid-sethi/PSQAN获得。
{"title":"PSQAN: a pipeline to prioritize novel and biologically relevant transcripts from long-read RNA sequencing.","authors":"Siddharth Sethi, Emil K Gustavsson, Harpreet Saini, Mina Ryten","doi":"10.1093/bioadv/vbaf293","DOIUrl":"10.1093/bioadv/vbaf293","url":null,"abstract":"<p><strong>Motivation: </strong>Long-read RNA sequencing has the potential to accurately quantify transcriptomes and reveal the isoform diversity of disease-causing genes. However, despite the recent advances in analysis tools for transcript discovery, long-read RNA sequencing data is still challenging to analyse, due to the detection of hundreds or even thousands of novel transcripts per gene.</p><p><strong>Results: </strong>Here, we introduce PSQAN, a workflow to help researchers prioritize high-confidence and potentially biologically relevant transcripts associated with candidate genes and make transcript characterization results more interpretable. PSQAN performs a gene-based analysis on characterized transcripts generated by SQANTI3 and TALON. PSQAN re-groups transcripts into easily interpretable categories to facilitate their prioritization, allows transcript-level expression thresholds, and generates visualizations to determine optimal expression thresholds. Overall, we demonstrate that PSQAN is a useful tool which enables users to identify known and novel transcripts of potential biological importance.</p><p><strong>Availability and implementation: </strong>PSQAN is an analysis workflow implemented in Snakemake and R and is licensed under the GNU General Public License version 3. The source code and documentation of this tool is available at https://github.com/sid-sethi/PSQAN.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf293"},"PeriodicalIF":2.8,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12701792/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145758443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adding highly variable genes to spatially variable genes can improve cell type clustering performance in spatial transcriptomics data. 在空间可变基因中加入高可变基因可以提高空间转录组学数据中细胞类型聚类的性能。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-20 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf285
Yijun Li, Stefan Stanojevic, Bing He, Zheng Jing, Qianhui Huang, Jian Kang, Lana X Garmire

Motivation: Spatial transcriptomics has allowed researchers to analyze transcriptome data in its tissue sample's spatial context. Various methods have been developed for detecting spatially variable genes (SV genes), whose gene expression over the tissue space shows strong spatial autocorrelation. Such genes are often used to define clusters in cells or spots downstream. However, highly variable (HV) genes, whose quantitative gene expressions show significant variation from cell to cell, are conventionally used in clustering analyses.

Results: In this report, we investigate whether adding highly variable genes to spatially variable genes can improve the cell type clustering performance in spatial transcriptomics data. We tested the clustering performance of HV genes, SV genes, and the union of both gene sets (concatenation) on over 50 real spatial transcriptomics datasets across multiple platforms, using a variety of spatial and non-spatial metrics. Our results show that combining HV genes and SV genes can improve overall cell-type clustering performance.

Availability and implementation: All data and code used in this evaluation study can be found in the following link: https://github.com/lanagarmire/ST_benchmark.

动机:空间转录组学允许研究人员在其组织样本的空间背景下分析转录组数据。空间可变基因(SV基因)在组织空间上的表达具有很强的空间自相关性。这类基因通常用来定义细胞或下游的斑点中的集群。然而,高变量(HV)基因,其定量基因表达在细胞间表现出显著差异,通常用于聚类分析。结果:在本报告中,我们研究了在空间可变基因中加入高可变基因是否可以提高空间转录组学数据中的细胞类型聚类性能。我们使用各种空间和非空间指标,在多个平台上的50多个真实空间转录组学数据集上测试了HV基因、SV基因以及这两个基因集的联合(串联)的聚类性能。我们的研究结果表明,结合HV基因和SV基因可以提高整体细胞型聚类性能。可用性和实施:本评价研究中使用的所有数据和代码可在以下链接中找到:https://github.com/lanagarmire/ST_benchmark。
{"title":"Adding highly variable genes to spatially variable genes can improve cell type clustering performance in spatial transcriptomics data.","authors":"Yijun Li, Stefan Stanojevic, Bing He, Zheng Jing, Qianhui Huang, Jian Kang, Lana X Garmire","doi":"10.1093/bioadv/vbaf285","DOIUrl":"https://doi.org/10.1093/bioadv/vbaf285","url":null,"abstract":"<p><strong>Motivation: </strong>Spatial transcriptomics has allowed researchers to analyze transcriptome data in its tissue sample's spatial context. Various methods have been developed for detecting spatially variable genes (SV genes), whose gene expression over the tissue space shows strong spatial autocorrelation. Such genes are often used to define clusters in cells or spots downstream. However, highly variable (HV) genes, whose quantitative gene expressions show significant variation from cell to cell, are conventionally used in clustering analyses.</p><p><strong>Results: </strong>In this report, we investigate whether adding highly variable genes to spatially variable genes can improve the cell type clustering performance in spatial transcriptomics data. We tested the clustering performance of HV genes, SV genes, and the union of both gene sets (concatenation) on over 50 real spatial transcriptomics datasets across multiple platforms, using a variety of spatial and non-spatial metrics. Our results show that combining HV genes and SV genes can improve overall cell-type clustering performance.</p><p><strong>Availability and implementation: </strong>All data and code used in this evaluation study can be found in the following link: https://github.com/lanagarmire/ST_benchmark.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf285"},"PeriodicalIF":2.8,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12809558/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145999833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SiaScoreNet: a siamese neural network-based model integrating prediction scores for HLA-peptide interaction prediction. SiaScoreNet:一个基于暹罗神经网络的模型,集成了hla -肽相互作用预测的预测分数。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-19 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf248
Mahsa Saadat, Fatemeh Zare-Mirakabad, Milad Besharatifard

Motivation: Cancer immunotherapy uses the immune system to recognize and eliminate tumor cells by presenting tumor antigens through Human Leukocyte Antigen (HLA) molecules. Accurate prediction of HLA-peptide interactions is essential for personalized immunotherapy development. Allele-specific models achieve high accuracy and handle variable peptide lengths but require separate training for each allele, limiting scalability to rare or unseen HLAs. Pan-specific models generalize across multiple alleles and match or surpass allele-specific methods. Ensemble methods improve prediction by combining outputs from multiple predictors, often via linear combinations, though nonlinear strategies may better capture HLA-peptide complexities.We propose SiaScoreNet, a three-step predictive pipeline enhancing HLA-peptide interaction prediction. First, ESM, a pretrained transformer-based protein language model, embeds HLA and peptide sequences into fixed-length representations, accommodating varying sequence lengths. Second, we integrate predicted scores from state-of-the-art models into a comprehensive feature vector. Third, a nonlinear ensemble strategy combines features, capturing complex dependencies and boosting performance.

Results: Benchmark evaluations show SiaScoreNet outperforms existing models in accuracy, comparable to TransPHLA, BigMHC, and CapHLA. Recent models prioritize recall over precision, valuable for identifying potential binders but resource-intensive. SiaScoreNet offers improved performance and runtime efficiency compared to these models, evaluated against HPV viruses for HLA-peptide prediction.

Availability and implementation: The data and source code for prediction and experiments presented in this study is publicly available in the SiaScoreNet repository hosted on GitHub: https://github.com/CBRC-lab/SiaScoreNet.

动机:癌症免疫疗法利用免疫系统通过人类白细胞抗原(HLA)分子呈递肿瘤抗原,从而识别和消灭肿瘤细胞。准确预测hla -肽相互作用对于个性化免疫治疗的发展至关重要。等位基因特异性模型具有很高的准确性和处理可变的肽长度,但需要对每个等位基因进行单独的训练,限制了罕见或未见的hla的可扩展性。泛特异性模型在多个等位基因之间进行推广,匹配或超越等位基因特异性方法。集成方法通过组合多个预测器的输出来改进预测,通常是通过线性组合,尽管非线性策略可能更好地捕获hla肽的复杂性。我们提出SiaScoreNet,一个三步预测管道,增强hla -肽相互作用预测。首先,ESM是一种预训练的基于转换器的蛋白质语言模型,它将HLA和肽序列嵌入到固定长度的表示中,以适应不同的序列长度。其次,我们将最先进模型的预测分数集成到一个综合特征向量中。第三,非线性集成策略结合特征,捕获复杂的依赖关系并提高性能。结果:基准评估表明,SiaScoreNet在准确性上优于现有模型,可与TransPHLA、BigMHC和CapHLA相媲美。最近的模型优先考虑召回而不是精度,这对识别潜在的粘合剂很有价值,但需要耗费大量资源。与这些模型相比,SiaScoreNet提供了更好的性能和运行时效率,用于HPV病毒的hla肽预测。可用性和实现:本研究中提出的预测和实验的数据和源代码在GitHub上的SiaScoreNet存储库中公开提供:https://github.com/CBRC-lab/SiaScoreNet。
{"title":"SiaScoreNet: a siamese neural network-based model integrating prediction scores for HLA-peptide interaction prediction.","authors":"Mahsa Saadat, Fatemeh Zare-Mirakabad, Milad Besharatifard","doi":"10.1093/bioadv/vbaf248","DOIUrl":"10.1093/bioadv/vbaf248","url":null,"abstract":"<p><strong>Motivation: </strong>Cancer immunotherapy uses the immune system to recognize and eliminate tumor cells by presenting tumor antigens through Human Leukocyte Antigen (HLA) molecules. Accurate prediction of HLA-peptide interactions is essential for personalized immunotherapy development. Allele-specific models achieve high accuracy and handle variable peptide lengths but require separate training for each allele, limiting scalability to rare or unseen HLAs. Pan-specific models generalize across multiple alleles and match or surpass allele-specific methods. Ensemble methods improve prediction by combining outputs from multiple predictors, often via linear combinations, though nonlinear strategies may better capture HLA-peptide complexities.We propose <i>SiaScoreNet</i>, a three-step predictive pipeline enhancing HLA-peptide interaction prediction. First, ESM, a pretrained transformer-based protein language model, embeds HLA and peptide sequences into fixed-length representations, accommodating varying sequence lengths. Second, we integrate predicted scores from state-of-the-art models into a comprehensive feature vector. Third, a nonlinear ensemble strategy combines features, capturing complex dependencies and boosting performance.</p><p><strong>Results: </strong>Benchmark evaluations show <i>SiaScoreNet</i> outperforms existing models in accuracy, comparable to TransPHLA, BigMHC, and CapHLA. Recent models prioritize recall over precision, valuable for identifying potential binders but resource-intensive. <i>SiaScoreNet</i> offers improved performance and runtime efficiency compared to these models, evaluated against HPV viruses for HLA-peptide prediction.</p><p><strong>Availability and implementation: </strong>The data and source code for prediction and experiments presented in this study is publicly available in the <i>SiaScoreNet</i> repository hosted on GitHub: https://github.com/CBRC-lab/SiaScoreNet.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf248"},"PeriodicalIF":2.8,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12641608/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145607747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MxlPy-Python package for mechanistic learning and hybrid modelling in life science. MxlPy-Python包,用于生命科学中的机械学习和混合建模。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-18 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf294
Marvin van Aalst, Tim Nies, Tobias Pfennig, Anna Matuszyńska

Summary: Recent advances in artificial intelligence have accelerated the adoption of machine learning (ML) in biology, enabling powerful predictive models across diverse applications. However, in scientific research, the need for interpretability and mechanistic insight remains crucial. To address this, we introduce MxlPy, a Python package that combines mechanistic modelling with ML to deliver explainable, data-informed solutions. MxlPy facilitates mechanistic learning, an emerging approach that integrates the transparency of mathematical models with the flexibility of data-driven methods. By streamlining tasks such as data integration, model formulation, output analysis, and surrogate modelling, MxlPy enhances the modelling experience without sacrificing interpretability. Designed for both computational biologists and interdisciplinary researchers, it supports the development of accurate, efficient, and explainable models, making it a valuable tool for advancing bioinformatics, systems biology, and biomedical research.

Availability and implementation: MxlPy source code is freely available at https://github.com/Computational-Biology-Aachen/MxlPy. The full documentation with features and examples can be found here https://computational-biology-aachen.github.io/MxlPy.

摘要:人工智能的最新进展加速了机器学习(ML)在生物学中的应用,为各种应用提供了强大的预测模型。然而,在科学研究中,对可解释性和机制洞察力的需求仍然至关重要。为了解决这个问题,我们引入了MxlPy,这是一个Python包,它将机械建模与ML相结合,以提供可解释的、数据知情的解决方案。MxlPy促进了机械学习,这是一种将数学模型的透明性与数据驱动方法的灵活性相结合的新兴方法。通过简化数据集成、模型制定、输出分析和代理建模等任务,MxlPy在不牺牲可解释性的情况下增强了建模体验。它为计算生物学家和跨学科研究人员设计,支持开发准确、高效和可解释的模型,使其成为推进生物信息学、系统生物学和生物医学研究的宝贵工具。可用性和实现:MxlPy源代码可在https://github.com/Computational-Biology-Aachen/MxlPy免费获得。包含特性和示例的完整文档可以在这里找到https://computational-biology-aachen.github.io/MxlPy。
{"title":"MxlPy-Python package for mechanistic learning and hybrid modelling in life science.","authors":"Marvin van Aalst, Tim Nies, Tobias Pfennig, Anna Matuszyńska","doi":"10.1093/bioadv/vbaf294","DOIUrl":"10.1093/bioadv/vbaf294","url":null,"abstract":"<p><strong>Summary: </strong>Recent advances in artificial intelligence have accelerated the adoption of machine learning (ML) in biology, enabling powerful predictive models across diverse applications. However, in scientific research, the need for interpretability and mechanistic insight remains crucial. To address this, we introduce MxlPy, a Python package that combines mechanistic modelling with ML to deliver explainable, data-informed solutions. MxlPy facilitates mechanistic learning, an emerging approach that integrates the transparency of mathematical models with the flexibility of data-driven methods. By streamlining tasks such as data integration, model formulation, output analysis, and surrogate modelling, MxlPy enhances the modelling experience without sacrificing interpretability. Designed for both computational biologists and interdisciplinary researchers, it supports the development of accurate, efficient, and explainable models, making it a valuable tool for advancing bioinformatics, systems biology, and biomedical research.</p><p><strong>Availability and implementation: </strong>MxlPy source code is freely available at https://github.com/Computational-Biology-Aachen/MxlPy. The full documentation with features and examples can be found here https://computational-biology-aachen.github.io/MxlPy.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf294"},"PeriodicalIF":2.8,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12668773/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145662876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics advances
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1