首页 > 最新文献

Bioinformatics advances最新文献

英文 中文
scExplorer: a comprehensive web server for single-cell RNA sequencing data analysis. scExplorer:用于单细胞RNA测序数据分析的综合web服务器。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-11-03 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf273
Sergio Hernández-Galaz, Andrés Hernández-Olivera, Felipe Villanelo, Alvaro Lladser, Alberto J M Martin

Summary: Computational analysis of single-cell RNA sequencing (scRNA-seq) data presents significant barriers for researchers lacking programming expertise, particularly for multi-dataset integration, scalable job management, and reproducible workflows. We developed scExplorer, a web-based platform that addresses these limitations through three key innovations: Comprehensive batch correction using four state-of-the-art algorithms (ComBat, Scanorama, BBKNN, and Harmony), SLURM-based job scheduling with pause/resume functionality for large-scale analyses, and automated generation of publication-ready reports with exportable configuration files ensuring complete reproducibility. The platform's modular Docker architecture supports both standalone and client-server deployments, enabling analysis of datasets ranging from thousands to hundreds of thousands of cells. An openly documented REST API clarifies how the interface orchestrates analyses and supports transparent operation. scExplorer eliminates the technical barriers that prevent non-computational researchers from performing rigorous scRNA-seq analysis while maintaining the transparency and reproducibility standards required for collaborative research.

Availability and implementation: https://apps.cienciavida.org/scexplorer/.

摘要:单细胞RNA测序(scRNA-seq)数据的计算分析对于缺乏编程专业知识的研究人员来说存在重大障碍,特别是在多数据集集成、可扩展的作业管理和可重复的工作流程方面。我们开发了scExplorer,这是一个基于网络的平台,通过三个关键创新解决了这些限制:使用四种最先进的算法(ComBat, Scanorama, BBKNN和Harmony)进行全面批量校正,基于slurm的作业调度,具有暂停/恢复功能,用于大规模分析,以及自动生成具有可导出配置文件的出版准备报告,确保完全再现性。该平台的模块化Docker架构支持独立部署和客户端-服务器部署,能够分析从数千到数十万个单元的数据集。公开记录的REST API阐明了接口如何编排分析并支持透明操作。scExplorer消除了阻碍非计算研究人员进行严格scRNA-seq分析的技术障碍,同时保持了合作研究所需的透明度和可重复性标准。可用性和实现:https://apps.cienciavida.org/scexplorer/。
{"title":"scExplorer: a comprehensive web server for single-cell RNA sequencing data analysis.","authors":"Sergio Hernández-Galaz, Andrés Hernández-Olivera, Felipe Villanelo, Alvaro Lladser, Alberto J M Martin","doi":"10.1093/bioadv/vbaf273","DOIUrl":"10.1093/bioadv/vbaf273","url":null,"abstract":"<p><strong>Summary: </strong>Computational analysis of single-cell RNA sequencing (scRNA-seq) data presents significant barriers for researchers lacking programming expertise, particularly for multi-dataset integration, scalable job management, and reproducible workflows. We developed scExplorer, a web-based platform that addresses these limitations through three key innovations: Comprehensive batch correction using four state-of-the-art algorithms (ComBat, Scanorama, BBKNN, and Harmony), SLURM-based job scheduling with pause/resume functionality for large-scale analyses, and automated generation of publication-ready reports with exportable configuration files ensuring complete reproducibility. The platform's modular Docker architecture supports both standalone and client-server deployments, enabling analysis of datasets ranging from thousands to hundreds of thousands of cells. An openly documented REST API clarifies how the interface orchestrates analyses and supports transparent operation. scExplorer eliminates the technical barriers that prevent non-computational researchers from performing rigorous scRNA-seq analysis while maintaining the transparency and reproducibility standards required for collaborative research.</p><p><strong>Availability and implementation: </strong>https://apps.cienciavida.org/scexplorer/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf273"},"PeriodicalIF":2.8,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12627405/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145566119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CoRTE: a web-service for constructing temporal networks from genotype-tissue expression data. CoRTE:一个从基因型-组织表达数据构建时间网络的web服务。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-31 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf272
Pietro Cinaglia, Mario Cannataro

Motivation: A comprehensive and in-depth deciphering of the dynamics concerning gene expressions is essential for understanding intricate biological mechanisms; for instance, the latter can be effectively addressed via network science, and Gene Co-expression Networks (GCNs), specifically. However, a typical GCN is based on a static model, which limits the ability to reflect changes that occur over time. To overcome this issue, we designed an open-source user-friendly web-service for constructing temporal networks from genotype-tissue expression data: COnstructing Real-world TEmporal networks (CoRTE).

Results: CoRTE bases the construction of a temporal network on the statistical analysis of the related gene co-expressions across successive age ranges, to define an ordered set of time points. In our experimentation we investigated gene co-expression dynamics across age groups in brain tissues associated with Alzheimer's Disease, processing curated aging-related data via the proposed web-service. The latter has effectively generated the temporal network consisting of a set of gene pairs that showed statistically significant co-expressions over time. Results demonstrated its capacity to capture time-dependent gene interactions relevant for aging-related disease progression. From a purely applicative point of view, CoRTE may be particularly suitable for exploring aging-related changes, disease development, and other time-dependent biological events.

Availability and implementation: CoRTE is freely available at https://github.com/pietrocinaglia/corte-ws.

动机:全面深入地解读基因表达的动态对于理解复杂的生物学机制至关重要;例如,后者可以通过网络科学,特别是基因共表达网络(GCNs)有效地解决。然而,典型的GCN是基于静态模型的,这限制了反映随时间发生的变化的能力。为了克服这个问题,我们设计了一个开源的用户友好的web服务,用于从基因型组织表达数据构建时间网络:构建真实世界的时间网络(CoRTE)。结果:CoRTE基于对连续年龄范围内相关基因共表达的统计分析构建了一个时间网络,定义了一个有序的时间点集合。在我们的实验中,我们研究了与阿尔茨海默病相关的脑组织中不同年龄组的基因共表达动态,通过提议的网络服务处理与衰老相关的数据。后者有效地产生了由一组基因对组成的时间网络,这些基因对随着时间的推移显示出统计上显著的共表达。结果表明,它能够捕获与衰老相关疾病进展相关的时间依赖性基因相互作用。从纯粹的应用角度来看,CoRTE可能特别适合于探索与衰老相关的变化、疾病发展和其他时间依赖性的生物事件。可用性和实现:CoRTE可以在https://github.com/pietrocinaglia/corte-ws上免费获得。
{"title":"CoRTE: a web-service for constructing temporal networks from genotype-tissue expression data.","authors":"Pietro Cinaglia, Mario Cannataro","doi":"10.1093/bioadv/vbaf272","DOIUrl":"10.1093/bioadv/vbaf272","url":null,"abstract":"<p><strong>Motivation: </strong>A comprehensive and in-depth deciphering of the dynamics concerning gene expressions is essential for understanding intricate biological mechanisms; for instance, the latter can be effectively addressed via network science, and Gene Co-expression Networks (GCNs), specifically. However, a typical GCN is based on a static model, which limits the ability to reflect changes that occur over time. To overcome this issue, we designed an open-source user-friendly web-service for constructing temporal networks from genotype-tissue expression data: <i>COnstructing Real-world TEmporal networks</i> (CoRTE).</p><p><strong>Results: </strong>CoRTE bases the construction of a temporal network on the statistical analysis of the related gene co-expressions across successive age ranges, to define an ordered set of time points. In our experimentation we investigated gene co-expression dynamics across age groups in brain tissues associated with Alzheimer's Disease, processing curated aging-related data via the proposed web-service. The latter has effectively generated the temporal network consisting of a set of gene pairs that showed statistically significant co-expressions over time. Results demonstrated its capacity to capture time-dependent gene interactions relevant for aging-related disease progression. From a purely applicative point of view, CoRTE may be particularly suitable for exploring aging-related changes, disease development, and other time-dependent biological events.</p><p><strong>Availability and implementation: </strong>CoRTE is freely available at https://github.com/pietrocinaglia/corte-ws.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf272"},"PeriodicalIF":2.8,"publicationDate":"2025-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12633645/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145590039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Long short-term memory-based deep learning model for the discovery of antimicrobial peptides targeting Mycobacterium tuberculosis. 基于长短期记忆的深度学习模型用于发现针对结核分枝杆菌的抗菌肽。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-31 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf274
Linfeng Wang, Susana Campino, Taane G Clark, Jody E Phelan

Motivation: Tuberculosis, caused by Mycobacterium tuberculosis, remains a global health challenge driven by rising antibiotic resistance. Antimicrobial peptides offer a promising alternative due to membrane-disruptive activity and low resistance potential, yet the scarcity of TB-specific AMP data constrains targeted development. We present a reproducible deep learning protocol that integrates long short-term memory networks with transfer learning to classify and generate TB-active peptides.

Results: Classifiers were pretrained on a large corpus of general AMPs and fine-tuned on curated TB-specific sequences using frozen encoder and full backpropagation strategies. We benchmarked four model variants [unidirectional and bidirectional long short-term memories (LSTMs), with and without attention] on a held-out TB test set; the unidirectional LSTM with a frozen encoder achieved the best performance (accuracy 90%, AUC 0.97). In parallel, LSTM-based generative models were trained to produce de novo TB-active peptides. A generator trained exclusively on TB data produced 94 of 100 peptides predicted as antimicrobial by AMP Scanner, outperforming transfer learning-based generators. Generated peptides were evaluated for antimicrobial activity, toxicity, structure, and AMP-like physicochemical traits, and four candidates shared ≥84% identity with known TB-AMPs.

Availability and implementation: The complete model and data can be found at: https://github.com/linfeng-wang/TB-AMP-design.

动机:由结核分枝杆菌引起的结核病,由于抗生素耐药性上升,仍然是一项全球卫生挑战。抗菌肽具有膜破坏活性和低耐药潜力,是一种很有前景的替代方案,但结核病特异性AMP数据的缺乏限制了靶向开发。我们提出了一种可重复的深度学习协议,该协议集成了长短期记忆网络和迁移学习,以分类和生成结核病活性肽。结果:分类器在大型通用amp语料库上进行了预训练,并使用冻结编码器和完全反向传播策略对策划的结核病特异性序列进行了微调。我们在一个固定的结核病测试集上对四种模型变体[单向和双向长短期记忆(LSTMs),有和没有注意]进行了基准测试;具有冻结编码器的单向LSTM获得了最好的性能(精度90%,AUC 0.97)。同时,基于lstm的生成模型被训练以产生新的结核病活性肽。一个专门针对结核病数据进行训练的生成器产生了AMP Scanner预测的100个抗菌肽中的94个,优于基于迁移学习的生成器。对生成的肽进行抗菌活性、毒性、结构和类抗菌肽的理化特性评估,4个候选肽与已知tb -抗菌肽具有≥84%的一致性。可用性和实现:完整的模型和数据可以在https://github.com/linfeng-wang/TB-AMP-design上找到。
{"title":"Long short-term memory-based deep learning model for the discovery of antimicrobial peptides targeting <i>Mycobacterium tuberculosis</i>.","authors":"Linfeng Wang, Susana Campino, Taane G Clark, Jody E Phelan","doi":"10.1093/bioadv/vbaf274","DOIUrl":"10.1093/bioadv/vbaf274","url":null,"abstract":"<p><strong>Motivation: </strong>Tuberculosis, caused by <i>Mycobacterium tuberculosis</i>, remains a global health challenge driven by rising antibiotic resistance. Antimicrobial peptides offer a promising alternative due to membrane-disruptive activity and low resistance potential, yet the scarcity of TB-specific AMP data constrains targeted development. We present a reproducible deep learning protocol that integrates long short-term memory networks with transfer learning to classify and generate TB-active peptides.</p><p><strong>Results: </strong>Classifiers were pretrained on a large corpus of general AMPs and fine-tuned on curated TB-specific sequences using frozen encoder and full backpropagation strategies. We benchmarked four model variants [unidirectional and bidirectional long short-term memories (LSTMs), with and without attention] on a held-out TB test set; the unidirectional LSTM with a frozen encoder achieved the best performance (accuracy 90%, AUC 0.97). In parallel, LSTM-based generative models were trained to produce de novo TB-active peptides. A generator trained exclusively on TB data produced 94 of 100 peptides predicted as antimicrobial by AMP Scanner, outperforming transfer learning-based generators. Generated peptides were evaluated for antimicrobial activity, toxicity, structure, and AMP-like physicochemical traits, and four candidates shared ≥84% identity with known TB-AMPs.</p><p><strong>Availability and implementation: </strong>The complete model and data can be found at: https://github.com/linfeng-wang/TB-AMP-design.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf274"},"PeriodicalIF":2.8,"publicationDate":"2025-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12603352/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145508185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PSO-FeatureFusion: a general framework for fusing heterogeneous features via particle swarm optimization. PSO-FeatureFusion:通过粒子群优化实现异构特征融合的通用框架。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-29 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf263
Raziyeh Masumshah, Changiz Eslahchi

Motivation: Integrating heterogeneous biological data is a central challenge in bioinformatics, especially when modeling complex relationships among entities such as drugs, diseases, and molecular features. Existing methods often rely on static or separate feature extraction processes, which may fail to capture interactions across diverse feature types and reduce predictive accuracy.

Results: To address these limitations, we propose PSO-FeatureFusion, a unified framework that combines particle swarm optimization with neural networks to jointly integrate and optimize features from multiple biological entities. By modeling pairwise feature interactions and learning their optimal contributions, the framework captures individual feature signals and their interdependencies in a task-agnostic and modular manner. We applied PSO-FeatureFusion to two bioinformatics tasks-drug-drug interaction and drug-disease association prediction-using multiple benchmark datasets. Across both tasks, the framework achieved strong performance across evaluation metrics, often outperforming or matching state-of-the-art baselines, including deep learning and graph-based models. The method also demonstrated robustness with limited hyperparameter tuning and flexibility across datasets with varying feature structures. PSO-FeatureFusion provides a scalable and practical solution for researchers working with high-dimensional biological data. Its adaptability and interpretability make it well-suited for applications in drug discovery, disease prediction, and other bioinformatics domains.

Availability and implementation: The source code and datasets are available at https://github.com/raziyehmasumshah/PSO-FeatureFusion.

动机:整合异构生物数据是生物信息学的核心挑战,特别是在对药物、疾病和分子特征等实体之间的复杂关系进行建模时。现有的方法通常依赖于静态或独立的特征提取过程,这可能无法捕获不同特征类型之间的交互,从而降低预测的准确性。结果:为了解决这些局限性,我们提出了PSO-FeatureFusion,这是一个将粒子群优化与神经网络相结合的统一框架,可以共同整合和优化来自多个生物实体的特征。通过对两两特征交互建模并学习它们的最优贡献,该框架以任务不可知和模块化的方式捕获单个特征信号及其相互依赖性。我们使用多个基准数据集将PSO-FeatureFusion应用于两个生物信息学任务-药物-药物相互作用和药物-疾病关联预测。在这两项任务中,该框架在评估指标上都取得了出色的表现,通常优于或匹配最先进的基线,包括深度学习和基于图的模型。该方法还证明了鲁棒性,具有有限的超参数调整和跨不同特征结构的数据集的灵活性。PSO-FeatureFusion为研究人员处理高维生物数据提供了可扩展的实用解决方案。它的适应性和可解释性使其非常适合于药物发现、疾病预测和其他生物信息学领域的应用。可用性和实现:源代码和数据集可在https://github.com/raziyehmasumshah/PSO-FeatureFusion上获得。
{"title":"PSO-FeatureFusion: a general framework for fusing heterogeneous features via particle swarm optimization.","authors":"Raziyeh Masumshah, Changiz Eslahchi","doi":"10.1093/bioadv/vbaf263","DOIUrl":"10.1093/bioadv/vbaf263","url":null,"abstract":"<p><strong>Motivation: </strong>Integrating heterogeneous biological data is a central challenge in bioinformatics, especially when modeling complex relationships among entities such as drugs, diseases, and molecular features. Existing methods often rely on static or separate feature extraction processes, which may fail to capture interactions across diverse feature types and reduce predictive accuracy.</p><p><strong>Results: </strong>To address these limitations, we propose PSO-FeatureFusion, a unified framework that combines particle swarm optimization with neural networks to jointly integrate and optimize features from multiple biological entities. By modeling pairwise feature interactions and learning their optimal contributions, the framework captures individual feature signals and their interdependencies in a task-agnostic and modular manner. We applied PSO-FeatureFusion to two bioinformatics tasks-drug-drug interaction and drug-disease association prediction-using multiple benchmark datasets. Across both tasks, the framework achieved strong performance across evaluation metrics, often outperforming or matching state-of-the-art baselines, including deep learning and graph-based models. The method also demonstrated robustness with limited hyperparameter tuning and flexibility across datasets with varying feature structures. PSO-FeatureFusion provides a scalable and practical solution for researchers working with high-dimensional biological data. Its adaptability and interpretability make it well-suited for applications in drug discovery, disease prediction, and other bioinformatics domains.</p><p><strong>Availability and implementation: </strong>The source code and datasets are available at https://github.com/raziyehmasumshah/PSO-FeatureFusion.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf263"},"PeriodicalIF":2.8,"publicationDate":"2025-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12596698/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145491087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shiny-Calorie: a context-aware application for indirect calorimetry data analysis and visualization using R. shine - calorie:使用R进行间接量热数据分析和可视化的上下文感知应用程序。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-29 eCollection Date: 2026-01-01 DOI: 10.1093/bioadv/vbaf270
Stephan Grein, Tabea Elschner, Ronja Kardinal, Johanna Bruder, Akim Strohmeyer, Karthikeyan Gunasekaran, Jennifer Witt, Hildigunnur Hermannsdóttir, Janina Behrens, Mueez U-Din, Jiangyan Yu, Gerhard Heldmaier, Renate Schreiber, Jan Rozman, Markus Heine, Ludger Scheja, Anna Worthmann, Joerg Heeren, Dagmar Wachten, Kerstin Wilhelm-Jüngling, Alexander Pfeifer, Jan Hasenauer, Martin Klingenspor

Motivation: Indirect calorimetry is the standard method for metabolic phenotyping of animal models in pre-clinical research, supported by mature experimental protocols and widely used commercial platforms. However, a flexible, extensible, and user-friendly software suite that enables standardized integration of data and metadata from diverse metabolic phenotyping platforms-followed by unified statistical analysis and visualization-remains absent.

Results: We present Shiny-Calorie, an open-source interactive application for transparent data and metadata integration, comprehensive statistical data analysis, and visualization of indirect calorimetry datasets. Shiny-Calorie supports the majority of standard data formats across commercial metabolic phenotyping platforms, such as TSE and Sable Systems, COSMED platform and CLAMS/Columbus instruments, and provides export functionality of processed data into standardized formats. Built using GNU R with a reactive interface, Shiny-Calorie enables intuitive exploration of complex, multi-modal longitudinal datasets comprising categorical, continuous, ordinal, and count variables. The platform incorporates state-of-the-art statistical methods for robust hypothesis testing, thereby facilitating biologically meaningful interpretation of energy metabolism phenotypes, including resting metabolic rate and energy expenditure. Together, these features, streamline routine analysis workflows and enhances reproducibility and transparency in metabolic phenotyping studies.

Availability and implementation: Shiny-Calorie is freely available at https://shiny.iaas.uni-bonn.de/Shiny-Calorie/. User documentation and source code are available at https://github.com/ICB-DCM/Shiny-Calorie. A docker image is available from https://hub.docker.com/r/stephanmg/Shiny-Calorie. Instructional screen recordings are available on https://www.youtube.com/@shiny-calorie.

动机:间接量热法是临床前研究中动物模型代谢表型的标准方法,具有成熟的实验方案和广泛使用的商业平台。然而,一个灵活的、可扩展的、用户友好的软件套件,能够标准化集成来自不同代谢表型平台的数据和元数据,然后进行统一的统计分析和可视化,仍然缺乏。结果:我们提出了shine - calorie,一个用于透明数据和元数据集成、综合统计数据分析和间接量热数据集可视化的开源交互式应用程序。shine - calorie支持跨商业代谢表型平台的大多数标准数据格式,如TSE和Sable Systems, COSMED平台和CLAMS/Columbus仪器,并提供将处理后的数据导出为标准化格式的功能。Shiny-Calorie使用带有响应式界面的GNU R构建,可以直观地探索包含分类、连续、有序和计数变量的复杂、多模态纵向数据集。该平台采用最先进的统计方法进行稳健的假设检验,从而促进对能量代谢表型(包括静息代谢率和能量消耗)有生物学意义的解释。总之,这些特点,简化了常规分析工作流程,提高了代谢表型研究的可重复性和透明度。可用性和实施:shine - calorie可以在https://shiny.iaas.uni-bonn.de/Shiny-Calorie/免费获得。用户文档和源代码可从https://github.com/ICB-DCM/Shiny-Calorie获得。可从https://hub.docker.com/r/stephanmg/Shiny-Calorie获得docker映像。教学录像可在https://www.youtube.com/@shiny-calorie上找到。
{"title":"Shiny-Calorie: a context-aware application for indirect calorimetry data analysis and visualization using R.","authors":"Stephan Grein, Tabea Elschner, Ronja Kardinal, Johanna Bruder, Akim Strohmeyer, Karthikeyan Gunasekaran, Jennifer Witt, Hildigunnur Hermannsdóttir, Janina Behrens, Mueez U-Din, Jiangyan Yu, Gerhard Heldmaier, Renate Schreiber, Jan Rozman, Markus Heine, Ludger Scheja, Anna Worthmann, Joerg Heeren, Dagmar Wachten, Kerstin Wilhelm-Jüngling, Alexander Pfeifer, Jan Hasenauer, Martin Klingenspor","doi":"10.1093/bioadv/vbaf270","DOIUrl":"10.1093/bioadv/vbaf270","url":null,"abstract":"<p><strong>Motivation: </strong>Indirect calorimetry is the standard method for metabolic phenotyping of animal models in pre-clinical research, supported by mature experimental protocols and widely used commercial platforms. However, a flexible, extensible, and user-friendly software suite that enables standardized integration of data and metadata from diverse metabolic phenotyping platforms-followed by unified statistical analysis and visualization-remains absent.</p><p><strong>Results: </strong>We present Shiny-Calorie, an open-source interactive application for transparent data and metadata integration, comprehensive statistical data analysis, and visualization of indirect calorimetry datasets. Shiny-Calorie supports the majority of standard data formats across commercial metabolic phenotyping platforms, such as TSE and Sable Systems, COSMED platform and CLAMS/Columbus instruments, and provides export functionality of processed data into standardized formats. Built using GNU R with a reactive interface, Shiny-Calorie enables intuitive exploration of complex, multi-modal longitudinal datasets comprising categorical, continuous, ordinal, and count variables. The platform incorporates state-of-the-art statistical methods for robust hypothesis testing, thereby facilitating biologically meaningful interpretation of energy metabolism phenotypes, including resting metabolic rate and energy expenditure. Together, these features, streamline routine analysis workflows and enhances reproducibility and transparency in metabolic phenotyping studies.</p><p><strong>Availability and implementation: </strong>Shiny-Calorie is freely available at https://shiny.iaas.uni-bonn.de/Shiny-Calorie/. User documentation and source code are available at https://github.com/ICB-DCM/Shiny-Calorie. A docker image is available from https://hub.docker.com/r/stephanmg/Shiny-Calorie. Instructional screen recordings are available on https://www.youtube.com/@shiny-calorie.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"6 1","pages":"vbaf270"},"PeriodicalIF":2.8,"publicationDate":"2025-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12867577/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146121213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DSA-DeepFM: a dual-stage attention-enhanced DeepFM model for predicting anticancer synergistic drug combinations. DSA-DeepFM:用于预测抗癌协同药物组合的双阶段注意增强深度fm模型。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-27 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf269
Yuexi Gu, Yongheng Sun, Louxin Zhang, Jian Zu

Motivation: Drug combinations are crucial in combating drug resistance, reducing toxicity, and improving therapeutic outcomes in disease management. Because a large number of drugs are available, the potential combinations increase exponentially, making it impractical to rely solely on biological experiments to identify synergistic combinations. Consequently, machine learning methods are increasingly being used to find synergistic drug combinations. Most existing methods focus on predictive performance through auxiliary data or complex models, but neglecting underlying biological mechanisms limits their accuracy in predicting synergistic drug combinations.

Results: We present DSA-DeepFM, a deep learning model that integrates a dual-stage attention (DSA) mechanism with Factorization Machines (FMs) to predict synergistic two-drug combinations by addressing complex biological feature interactions. The model incorporates categorical and auxiliary numerical inputs to capture both field-aware and embedding-aware patterns. These patterns are then processed by a deep FM module, which captures low- and high-order feature interactions before making the final predictions. Validation testing demonstrates that DSA-DeepFM significantly outperforms traditional machine learning and state-of-the-art deep learning models. Furthermore, t-SNE visualizations confirm the discriminative power of the model at various stages. Additionally, we use our model to identify eight novel synergistic drug combinations, underscoring its practical utility and potential for future applications.

Availability and implementation: Source code is available at https://github.com/gracygyx/DSA-DeepFM.

动机:药物联合在对抗耐药性、降低毒性和改善疾病治疗结果方面至关重要。由于有大量的药物可用,潜在的组合呈指数增长,使得仅仅依靠生物学实验来确定协同组合是不切实际的。因此,机器学习方法越来越多地被用于寻找协同药物组合。大多数现有方法侧重于通过辅助数据或复杂模型预测性能,但忽略了潜在的生物学机制,限制了其预测协同药物组合的准确性。结果:我们提出了DSA- deepfm,这是一种深度学习模型,它将双阶段注意(DSA)机制与因子分解机(FMs)集成在一起,通过处理复杂的生物特征相互作用来预测两种药物的协同组合。该模型结合了分类和辅助数值输入,以捕获领域感知和嵌入感知模式。这些模式随后由深度调频模块处理,该模块在做出最终预测之前捕获低阶和高阶特征交互。验证测试表明,DSA-DeepFM显著优于传统的机器学习和最先进的深度学习模型。此外,t-SNE可视化证实了模型在不同阶段的判别能力。此外,我们使用我们的模型来确定八种新的协同药物组合,强调其实际效用和未来应用的潜力。可用性和实现:源代码可从https://github.com/gracygyx/DSA-DeepFM获得。
{"title":"DSA-DeepFM: a dual-stage attention-enhanced DeepFM model for predicting anticancer synergistic drug combinations.","authors":"Yuexi Gu, Yongheng Sun, Louxin Zhang, Jian Zu","doi":"10.1093/bioadv/vbaf269","DOIUrl":"10.1093/bioadv/vbaf269","url":null,"abstract":"<p><strong>Motivation: </strong>Drug combinations are crucial in combating drug resistance, reducing toxicity, and improving therapeutic outcomes in disease management. Because a large number of drugs are available, the potential combinations increase exponentially, making it impractical to rely solely on biological experiments to identify synergistic combinations. Consequently, machine learning methods are increasingly being used to find synergistic drug combinations. Most existing methods focus on predictive performance through auxiliary data or complex models, but neglecting underlying biological mechanisms limits their accuracy in predicting synergistic drug combinations.</p><p><strong>Results: </strong>We present DSA-DeepFM, a deep learning model that integrates a dual-stage attention (DSA) mechanism with Factorization Machines (FMs) to predict synergistic two-drug combinations by addressing complex biological feature interactions. The model incorporates categorical and auxiliary numerical inputs to capture both field-aware and embedding-aware patterns. These patterns are then processed by a deep FM module, which captures low- and high-order feature interactions before making the final predictions. Validation testing demonstrates that DSA-DeepFM significantly outperforms traditional machine learning and state-of-the-art deep learning models. Furthermore, t-SNE visualizations confirm the discriminative power of the model at various stages. Additionally, we use our model to identify eight novel synergistic drug combinations, underscoring its practical utility and potential for future applications.</p><p><strong>Availability and implementation: </strong>Source code is available at https://github.com/gracygyx/DSA-DeepFM.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf269"},"PeriodicalIF":2.8,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12609172/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145515098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TidyGWAS: a scalable approach for standardized cleaning of genome-wide association study summary statistics. TidyGWAS:一种可扩展的全基因组关联研究汇总统计标准化清洗方法。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-27 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf262
Arvid Harder, Jerry Guintivano, Joëlle A Pasman, Patrick F Sullivan, Yi Lu

Motivation: Genome-wide association studies (GWAS) have transformed human genetics by identifying tens of thousands of trait-associated variants, enabling applications from drug discovery to polygenic risk prediction. These advancements depend critically on open sharing of GWAS summary statistics. However, a lack of standardized formats complicates downstream analyses, requiring extensive dataset-specific "munging" before analysis can proceed.

Results: Here we present tidyGWAS, an R package that streamlines this process by cleanly separating data validation and harmonization from quality control. tidyGWAS uses curated data to repair and harmonize variant identifiers across genome builds, imputes missing columns when possible, and validates summary statistics with minimal filters. Outputs are saved as partitioned parquet files, optimized for high-throughput analysis via the arrow package. Benchmarked against existing tools tidyGWAS is up to 6.5× faster and substantially more memory efficient. Additionally, we implement a fixed-effects meta-analysis directly on tidyGWAS output, achieving up to 10× speedup over existing software. tidyGWAS simplifies and accelerates statistical genetic workflows, improving reproducibility and scalability for large-scale genetic analyses.

Availability and implementation: The package, reference data, and Docker containers are freely available for broad adoption.

动机:全基因组关联研究(GWAS)通过识别数以万计的性状相关变异,改变了人类遗传学,使从药物发现到多基因风险预测的应用成为可能。这些进步主要依赖于GWAS汇总统计数据的公开共享。然而,缺乏标准化格式使下游分析变得复杂,需要在分析进行之前进行大量特定于数据集的“修改”。结果:在这里,我们提出了tidyGWAS,这是一个R包,通过清晰地将数据验证和协调从质量控制中分离出来,简化了这一过程。tidyGWAS使用整理的数据来修复和协调基因组构建中的变体标识符,在可能的情况下输入缺失的列,并使用最小的过滤器验证汇总统计信息。输出保存为分区的拼花文件,通过箭头包优化高吞吐量分析。通过对现有工具进行基准测试,tidyGWAS的速度提高了6.5倍,并且内存效率大大提高。此外,我们直接在tidyGWAS输出上实现了固定效应元分析,比现有软件实现了高达10倍的加速。tidyGWAS简化并加速了统计遗传工作流程,提高了大规模遗传分析的可重复性和可扩展性。可用性和实现:包、参考数据和Docker容器都是免费的,可以广泛采用。
{"title":"TidyGWAS: a scalable approach for standardized cleaning of genome-wide association study summary statistics.","authors":"Arvid Harder, Jerry Guintivano, Joëlle A Pasman, Patrick F Sullivan, Yi Lu","doi":"10.1093/bioadv/vbaf262","DOIUrl":"10.1093/bioadv/vbaf262","url":null,"abstract":"<p><strong>Motivation: </strong>Genome-wide association studies (GWAS) have transformed human genetics by identifying tens of thousands of trait-associated variants, enabling applications from drug discovery to polygenic risk prediction. These advancements depend critically on open sharing of GWAS summary statistics. However, a lack of standardized formats complicates downstream analyses, requiring extensive dataset-specific \"munging\" before analysis can proceed.</p><p><strong>Results: </strong>Here we present tidyGWAS, an R package that streamlines this process by cleanly separating data validation and harmonization from quality control. tidyGWAS uses curated data to repair and harmonize variant identifiers across genome builds, imputes missing columns when possible, and validates summary statistics with minimal filters. Outputs are saved as partitioned parquet files, optimized for high-throughput analysis via the arrow package. Benchmarked against existing tools tidyGWAS is up to 6.5× faster and substantially more memory efficient. Additionally, we implement a fixed-effects meta-analysis directly on tidyGWAS output, achieving up to 10× speedup over existing software. tidyGWAS simplifies and accelerates statistical genetic workflows, improving reproducibility and scalability for large-scale genetic analyses.</p><p><strong>Availability and implementation: </strong>The package, reference data, and Docker containers are freely available for broad adoption.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf262"},"PeriodicalIF":2.8,"publicationDate":"2025-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12597892/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145497642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gaining insights into Alzheimer's disease by predicting chromatin spatial organization. 通过预测染色质空间组织来深入了解阿尔茨海默病。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-25 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf268
Camilo Villaman, Irene Cartas-Espinel, Mauricio Saez, Alberto J M Martin

Motivation: CTCF is a conserved protein involved in the establishment and maintenance of topologically associating domains (TADs) and loops. Alzheimer's disease (AD) represents the most common form of dementia, affecting over 50 million elderly individuals. Epigenetic alterations are a hallmark of AD, and epigenetic disruptions are able to affect CTCF binding and looping. Understanding the dynamics of CTCF loops behind AD may lead to new, undiscovered contributions of CTCF to the etiology of AD. To understand the dynamics behind CTCF loops, we developed a CTCF loop predictor using different genomic and epigenomic features, such as CTCF motif information, CTCF protein binding information, and different histone marks.

Results: We obtained F-scores of over 0.9 in GM12878 and K562 cell lines. We reported the importance of each feature in classification, and compared the results with other loop predictors. After testing the predictor, we predicted loops in control and AD data, reported a score of loop disruption and selected the top disrupted loops on AD which were all previously linked with AD in bibliography. Our study contributes to a better understanding of the role of CTCF binding and CTCF loops in gene regulation, and highlights new clues about CTCF in the etiology and development of AD.

Availability and implementation: The method can be found in https://github.com/networkbiolab/jalpy.

动机:CTCF是一种保守蛋白,参与了拓扑相关结构域(TADs)和环的建立和维持。阿尔茨海默病(AD)是最常见的痴呆症,影响着5000多万老年人。表观遗传改变是AD的标志,而表观遗传破坏能够影响CTCF的结合和环。了解AD背后CTCF循环的动力学可能会导致CTCF对AD病因学的新的、未被发现的贡献。为了了解CTCF环背后的动力学,我们开发了一个CTCF环预测器,使用不同的基因组和表观基因组特征,如CTCF基序信息、CTCF蛋白结合信息和不同的组蛋白标记。结果:我们在GM12878和K562细胞系中获得了大于0.9的f分数。我们报告了分类中每个特征的重要性,并将结果与其他循环预测因子进行了比较。在对预测器进行测试后,我们预测了对照和AD数据中的循环,报告了循环中断的分数,并选择了AD中先前与AD相关的最严重的中断循环。我们的研究有助于更好地理解CTCF结合和CTCF环在基因调控中的作用,并突出了CTCF在AD病因和发展中的新线索。可用性和实现:该方法可在https://github.com/networkbiolab/jalpy中找到。
{"title":"Gaining insights into Alzheimer's disease by predicting chromatin spatial organization.","authors":"Camilo Villaman, Irene Cartas-Espinel, Mauricio Saez, Alberto J M Martin","doi":"10.1093/bioadv/vbaf268","DOIUrl":"10.1093/bioadv/vbaf268","url":null,"abstract":"<p><strong>Motivation: </strong>CTCF is a conserved protein involved in the establishment and maintenance of topologically associating domains (TADs) and loops. Alzheimer's disease (AD) represents the most common form of dementia, affecting over 50 million elderly individuals. Epigenetic alterations are a hallmark of AD, and epigenetic disruptions are able to affect CTCF binding and looping. Understanding the dynamics of CTCF loops behind AD may lead to new, undiscovered contributions of CTCF to the etiology of AD. To understand the dynamics behind CTCF loops, we developed a CTCF loop predictor using different genomic and epigenomic features, such as CTCF motif information, CTCF protein binding information, and different histone marks.</p><p><strong>Results: </strong>We obtained F-scores of over 0.9 in GM12878 and K562 cell lines. We reported the importance of each feature in classification, and compared the results with other loop predictors. After testing the predictor, we predicted loops in control and AD data, reported a score of loop disruption and selected the top disrupted loops on AD which were all previously linked with AD in bibliography. Our study contributes to a better understanding of the role of CTCF binding and CTCF loops in gene regulation, and highlights new clues about CTCF in the etiology and development of AD.</p><p><strong>Availability and implementation: </strong>The method can be found in https://github.com/networkbiolab/jalpy.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf268"},"PeriodicalIF":2.8,"publicationDate":"2025-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12627407/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145565500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MutSeqR: an open source R package for standardized analysis of error-corrected next-generation sequencing data in genetic toxicology. MutSeqR:一个开源的R包,用于基因毒理学中校正错误的下一代测序数据的标准化分析。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-23 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf265
Annette E Dodge, Andrew Williams, Danielle P M LeBlanc, David M Schuster, Elena Esina, Charles C Valentine, Jesse J Salk, Alex Y Maslov, Chris Bradley, Carole L Yauk, Francesco Marchetti, Matthew J Meier

Motivation: Error-corrected next-generation sequencing (ECS) methods are increasingly used to assess mutagenicity and other genetic toxicology endpoints. The lack of open and standardized bioinformatic workflows and tools poses challenges to data reproducibility, comparability, and consistency in interpretation for its application in genetic toxicity assessment.

Results: We present MutSeqR, an open source R package to analyse ECS mutation data for genetic toxicology studies. MutSeqR offers practical variant filtering, comparative analysis of mutation frequency between experimental conditions, dose-response assessment via benchmark dose calculations, mutation spectrum analysis, and clonality analyses. We demonstrate MutSeqR's application using published datasets on mice treated with benzo[a]pyrene or benzo[b]fluoranthene, analysed using Duplex Sequencing and SMM-seq, respectively. MutSeqR's flexible functions enable reproducible analyses across ECS platforms, facilitating research and regulatory applications in mutagenicity testing.

Availability and implementation: MutSeqR is freely available under an open source license at https://github.com/EHSRB-BSRSE-Bioinformatics/MutSeqR. Implemented in R (version 3.4.0 or greater), it supports all major operating systems. Sequencing data for Project 1 has been deposited in the Sequence Read Archive under accession number PRJNA803048. Variant call files for Project 2 are available on Mendeley Data (doi: 10.17632/65dnysxym8.1).

动机:纠正错误的下一代测序(ECS)方法越来越多地用于评估突变性和其他遗传毒理学终点。缺乏开放和标准化的生物信息学工作流程和工具,对其在遗传毒性评估中的应用的数据再现性、可比性和解释的一致性提出了挑战。结果:我们提出了MutSeqR,这是一个开源的R包,用于分析ECS突变数据,用于遗传毒理学研究。MutSeqR提供实用的变异过滤,实验条件下突变频率的比较分析,通过基准剂量计算进行剂量-反应评估,突变谱分析和克隆分析。我们使用已发表的数据集来展示MutSeqR在用苯并[a]芘或苯并[b]荧光蒽处理的小鼠上的应用,分别使用双工测序和SMM-seq进行分析。MutSeqR的灵活功能可以跨ECS平台进行可重复分析,促进致突变性测试的研究和监管应用。可用性和实现:MutSeqR在开源许可下可在https://github.com/EHSRB-BSRSE-Bioinformatics/MutSeqR免费获得。它在R(3.4.0或更高版本)中实现,支持所有主要的操作系统。项目1的测序数据已存入Sequence Read Archive,登录号为PRJNA803048。项目2的变体调用文件可在Mendeley Data上获得(doi: 10.17632/65dnysxym8.1)。
{"title":"MutSeqR: an open source R package for standardized analysis of error-corrected next-generation sequencing data in genetic toxicology.","authors":"Annette E Dodge, Andrew Williams, Danielle P M LeBlanc, David M Schuster, Elena Esina, Charles C Valentine, Jesse J Salk, Alex Y Maslov, Chris Bradley, Carole L Yauk, Francesco Marchetti, Matthew J Meier","doi":"10.1093/bioadv/vbaf265","DOIUrl":"https://doi.org/10.1093/bioadv/vbaf265","url":null,"abstract":"<p><strong>Motivation: </strong>Error-corrected next-generation sequencing (ECS) methods are increasingly used to assess mutagenicity and other genetic toxicology endpoints. The lack of open and standardized bioinformatic workflows and tools poses challenges to data reproducibility, comparability, and consistency in interpretation for its application in genetic toxicity assessment.</p><p><strong>Results: </strong>We present MutSeqR, an open source R package to analyse ECS mutation data for genetic toxicology studies. MutSeqR offers practical variant filtering, comparative analysis of mutation frequency between experimental conditions, dose-response assessment via benchmark dose calculations, mutation spectrum analysis, and clonality analyses. We demonstrate MutSeqR's application using published datasets on mice treated with benzo[a]pyrene or benzo[b]fluoranthene, analysed using Duplex Sequencing and SMM-seq, respectively. MutSeqR's flexible functions enable reproducible analyses across ECS platforms, facilitating research and regulatory applications in mutagenicity testing.</p><p><strong>Availability and implementation: </strong>MutSeqR is freely available under an open source license at https://github.com/EHSRB-BSRSE-Bioinformatics/MutSeqR. Implemented in R (version 3.4.0 or greater), it supports all major operating systems. Sequencing data for Project 1 has been deposited in the Sequence Read Archive under accession number PRJNA803048. Variant call files for Project 2 are available on Mendeley Data (doi: 10.17632/65dnysxym8.1).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf265"},"PeriodicalIF":2.8,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12645840/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145643562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RNA-EFM: energy-based flow matching for protein-conditioned RNA sequence-structure co-design. RNA- efm:蛋白质条件RNA序列-结构协同设计的能量流匹配。
IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-10-22 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf258
Abrar Rahman Abir, Liqing Zhang

Motivation: Designing RNA molecules that can specifically bind to target proteins is fundamental to numerous biological and therapeutic applications. However, existing approaches to protein-conditioned RNA design primarily focus on structural alignment or sequence recovery, often ignoring essential biophysical factors such as molecular stability and thermodynamic feasibility.

Results: To address this gap, we propose RNA-EFM, a novel deep learning framework that integrates energy-based refinement with flow matching for protein-conditioned RNA sequence and structure co-design. RNA-EFM consists of two complementary components: a flow matching objective that supervises geometric alignment between predicted and native RNA backbone structures, and an energy-based idempotent refinement that iteratively improves RNA structure predictions by minimizing both structural error and physical energy. The energy refinement is guided by biophysical priors including the Lennard-Jones potential and sequence-derived free energy, ensuring that the generated RNAs are not only geometrically plausible but also thermodynamically stable. We demonstrate the effectiveness of RNA-EFM through extensive experiments. RNA-EFM significantly outperforms state-of-the-art baselines in terms of RMSD, lDDT, sequence recovery, and binding energy improvement. These results highlight the importance of incorporating biophysical constraints into RNA design and establish RNA-EFM as a promising framework.

Availability and implementation: The source code for RNA-EFM is available at: https://github.com/abrarrahmanabir/RNA-EFM.

动机:设计能够特异性结合靶蛋白的RNA分子是许多生物学和治疗应用的基础。然而,现有的蛋白质条件RNA设计方法主要集中在结构比对或序列恢复上,往往忽略了分子稳定性和热力学可行性等重要的生物物理因素。为了解决这一差距,我们提出了RNA- efm,这是一种新的深度学习框架,将基于能量的优化与蛋白质条件RNA序列和结构协同设计的流匹配相结合。RNA- efm由两个互补的部分组成:监督预测和天然RNA主链结构之间几何对齐的流匹配目标,以及基于能量的幂等改进,通过最小化结构误差和物理能量来迭代改进RNA结构预测。能量精化以生物物理先验为指导,包括Lennard-Jones势和序列衍生自由能,确保生成的rna不仅在几何上合理,而且在热力学上稳定。我们通过大量的实验证明了RNA-EFM的有效性。RNA-EFM在RMSD、lDDT、序列恢复和结合能改善方面明显优于最先进的基线。这些结果强调了将生物物理约束纳入RNA设计的重要性,并将RNA- efm建立为一个有前途的框架。可用性和实现:RNA-EFM的源代码可从:https://github.com/abrarrahmanabir/RNA-EFM获得。
{"title":"RNA-EFM: energy-based flow matching for protein-conditioned RNA sequence-structure co-design.","authors":"Abrar Rahman Abir, Liqing Zhang","doi":"10.1093/bioadv/vbaf258","DOIUrl":"10.1093/bioadv/vbaf258","url":null,"abstract":"<p><strong>Motivation: </strong>Designing RNA molecules that can specifically bind to target proteins is fundamental to numerous biological and therapeutic applications. However, existing approaches to protein-conditioned RNA design primarily focus on structural alignment or sequence recovery, often ignoring essential biophysical factors such as molecular stability and thermodynamic feasibility.</p><p><strong>Results: </strong>To address this gap, we propose RNA-EFM, a novel deep learning framework that integrates energy-based refinement with flow matching for protein-conditioned RNA sequence and structure co-design. RNA-EFM consists of two complementary components: a flow matching objective that supervises geometric alignment between predicted and native RNA backbone structures, and an energy-based idempotent refinement that iteratively improves RNA structure predictions by minimizing both structural error and physical energy. The energy refinement is guided by biophysical priors including the Lennard-Jones potential and sequence-derived free energy, ensuring that the generated RNAs are not only geometrically plausible but also thermodynamically stable. We demonstrate the effectiveness of RNA-EFM through extensive experiments. RNA-EFM significantly outperforms state-of-the-art baselines in terms of RMSD, lDDT, sequence recovery, and binding energy improvement. These results highlight the importance of incorporating biophysical constraints into RNA design and establish RNA-EFM as a promising framework.</p><p><strong>Availability and implementation: </strong>The source code for RNA-EFM is available at: https://github.com/abrarrahmanabir/RNA-EFM.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf258"},"PeriodicalIF":2.8,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12701795/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145758385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics advances
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1