首页 > 最新文献

Bioinformatics (Oxford, England)最新文献

英文 中文
Dynamic modelling of signalling pathways when ODEs are not feasible. 当 ODEs 不可行时,信号通路的动态建模。
Pub Date : 2024-11-18 DOI: 10.1093/bioinformatics/btae683
Timo Rachel, Eva Brombacher, Svenja Wöhrle, Olaf Groß, Clemens Kreutz

Motivation: Mathematical modelling plays a crucial role in understanding inter- and intracellular signalling processes. Currently, ordinary differential equations (ODEs) are the predominant approach in systems biology for modelling such pathways. While ODE models offer mechanistic interpretability, they also suffer from limitations, including the need to consider all relevant compounds, resulting in large models difficult to handle numerically and requiring extensive data.

Results: In previous work, we introduced the retarded transient function (RTF) as an alternative method for modelling temporal responses of signalling pathways. Here, we extend the RTF approach to integrate concentration or dose-dependencies into the modelling of dynamics. With this advancement, RTF modelling now fully encompasses the application range of ordinary differential equation (ODE) models, which comprises predictions in both time and concentration domains. Moreover, characterizing dose-dependencies provides an intuitive way to investigate and characterize signalling differences between biological conditions or cell-types based on their response to stimulating inputs. To demonstrate the applicability of our extended approach, we employ data from time- and dose-dependent inflammasome activation in bone-marrow derived macrophages (BMDMs) treated with nigericin sodium salt. Our results show the effectiveness of the extended RTF approach as a generic framework for modelling dose-dependent kinetics in cellular signalling. The approach results in intuitively interpretable parameters that describe signal dynamics and enables predictive modelling of time- and dose-dependencies even if only individual cellular components are quantified.

Availability: The presented approach is available within the MATLAB-based Data2Dynamics modelling toolbox at https://github.com/Data2Dynamics and https://zenodo.org/records/14008247 and as R code at https://github.com/kreutz-lab/RTF.

动机数学建模在理解细胞间和细胞内信号传递过程中起着至关重要的作用。目前,常微分方程(ODE)是系统生物学中模拟此类通路的主要方法。虽然常微分方程模型提供了机理上的可解释性,但它们也有局限性,包括需要考虑所有相关化合物,导致大型模型难以数值处理,并且需要大量数据:在之前的工作中,我们介绍了迟滞瞬态函数(RTF)作为信号通路时间反应建模的替代方法。在此,我们扩展了 RTF 方法,将浓度或剂量依赖性纳入动态建模。有了这一进步,RTF建模现在完全涵盖了常微分方程(ODE)模型的应用范围,其中包括时域和浓度域的预测。此外,描述剂量依赖性为研究和描述不同生物条件或细胞类型对刺激输入的反应的信号差异提供了一种直观的方法。为了证明我们的扩展方法的适用性,我们使用了经尼格列汀钠盐处理的骨髓衍生巨噬细胞(BMDMs)中时间和剂量依赖性炎性体激活的数据。我们的研究结果表明,作为一种通用框架,扩展的 RTF 方法可以有效地模拟细胞信号的剂量依赖性动力学。该方法可获得直观易懂的参数,用于描述信号动态,即使只对单个细胞成分进行量化,也能对时间和剂量依赖性进行预测建模:所介绍的方法可从基于 MATLAB 的 Data2Dynamics 建模工具箱中获取,网址为 https://github.com/Data2Dynamics 和 https://zenodo.org/records/14008247,R 代码可从 https://github.com/kreutz-lab/RTF 获取。
{"title":"Dynamic modelling of signalling pathways when ODEs are not feasible.","authors":"Timo Rachel, Eva Brombacher, Svenja Wöhrle, Olaf Groß, Clemens Kreutz","doi":"10.1093/bioinformatics/btae683","DOIUrl":"10.1093/bioinformatics/btae683","url":null,"abstract":"<p><strong>Motivation: </strong>Mathematical modelling plays a crucial role in understanding inter- and intracellular signalling processes. Currently, ordinary differential equations (ODEs) are the predominant approach in systems biology for modelling such pathways. While ODE models offer mechanistic interpretability, they also suffer from limitations, including the need to consider all relevant compounds, resulting in large models difficult to handle numerically and requiring extensive data.</p><p><strong>Results: </strong>In previous work, we introduced the retarded transient function (RTF) as an alternative method for modelling temporal responses of signalling pathways. Here, we extend the RTF approach to integrate concentration or dose-dependencies into the modelling of dynamics. With this advancement, RTF modelling now fully encompasses the application range of ordinary differential equation (ODE) models, which comprises predictions in both time and concentration domains. Moreover, characterizing dose-dependencies provides an intuitive way to investigate and characterize signalling differences between biological conditions or cell-types based on their response to stimulating inputs. To demonstrate the applicability of our extended approach, we employ data from time- and dose-dependent inflammasome activation in bone-marrow derived macrophages (BMDMs) treated with nigericin sodium salt. Our results show the effectiveness of the extended RTF approach as a generic framework for modelling dose-dependent kinetics in cellular signalling. The approach results in intuitively interpretable parameters that describe signal dynamics and enables predictive modelling of time- and dose-dependencies even if only individual cellular components are quantified.</p><p><strong>Availability: </strong>The presented approach is available within the MATLAB-based Data2Dynamics modelling toolbox at https://github.com/Data2Dynamics and https://zenodo.org/records/14008247 and as R code at https://github.com/kreutz-lab/RTF.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
M2ara: unraveling metabolomic drug responses in whole-cell MALDI mass spectrometry bioassays. M2ara:在全细胞 MALDI 质谱生物测定中揭示代谢组药物反应。
Pub Date : 2024-11-18 DOI: 10.1093/bioinformatics/btae694
Thomas Enzlein, Alexander Geisel, Carsten Hopf, Stefan Schmidt

Summary: Fast computational evaluation and classification of concentration responses for hundreds of metabolites represented by their mass-to-charge (m/z) ratios is indispensable for unraveling complex metabolomic drug actions in label-free, whole-cell Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry (MALDI MS) bioassays. In particular, the identification of novel pharmacodynamic biomarkers to determine target engagement, potency and potential polypharmacology of drug-like compounds in high-throughput applications requires robust data interpretation pipelines. Given the large number of mass features in cell-based MALDI MS bioassays, reliable identification of true biological response patterns and their differentiation from any measurement artefacts that may be present is critical. To facilitate the exploration of metabolomic responses in complex MALDI MS datasets, we present a novel software tool, M2ara. Implemented as a user-friendly R-based shiny application, it enables rapid evaluation of Molecular High Content Screening (MHCS) assay data. Furthermore, we introduce the concept of Curve Response Score (CRS) and CRS fingerprints to enable rapid visual inspection and ranking of mass features. In addition, these CRS fingerprints allow direct comparison of cellular effects among different compounds. Beyond cellular assays, our computational framework can also be applied to MALDI MS-based (cell-free) biochemical assays in general.

Availability and implementation: The software tool, code and examples are available at https://github.com/CeMOS-Mannheim/M2ara and https://dx.doi.org/10.6084/m9.figshare.25736541.

Supplementary information: Supplementary material is available at Bioinformatics online.

摘要在无标记、全细胞基质辅助激光解吸/电离质谱(MALDI MS)生物测定中,要揭示复杂的代谢组学药物作用,就必须对以质量电荷比(m/z)表示的数百种代谢物的浓度反应进行快速计算评估和分类。特别是,在高通量应用中鉴定新型药效学生物标志物以确定药物类似化合物的靶点参与、药效和潜在的多药理作用需要强大的数据解读管道。鉴于基于细胞的 MALDI MS 生物测定中存在大量质量特征,因此可靠地识别真正的生物反应模式并将其与可能存在的任何测量伪影区分开来至关重要。为了便于探索复杂 MALDI MS 数据集中的代谢组学反应,我们推出了一款新型软件工具 M2ara。它是一款基于 R 的闪亮应用程序,用户使用方便,能快速评估分子高内涵筛选 (MHCS) 检测数据。此外,我们还引入了曲线响应得分(CRS)和 CRS 指纹的概念,以实现质量特征的快速视觉检测和排序。此外,这些 CRS 指纹可以直接比较不同化合物对细胞的影响。除细胞检测外,我们的计算框架还可应用于基于 MALDI MS(无细胞)的一般生化检测:软件工具、代码和示例见 https://github.com/CeMOS-Mannheim/M2ara 和 https://dx.doi.org/10.6084/m9.figshare.25736541.Supplementary 信息:补充材料可在 Bioinformatics online 上查阅。
{"title":"M2ara: unraveling metabolomic drug responses in whole-cell MALDI mass spectrometry bioassays.","authors":"Thomas Enzlein, Alexander Geisel, Carsten Hopf, Stefan Schmidt","doi":"10.1093/bioinformatics/btae694","DOIUrl":"10.1093/bioinformatics/btae694","url":null,"abstract":"<p><strong>Summary: </strong>Fast computational evaluation and classification of concentration responses for hundreds of metabolites represented by their mass-to-charge (m/z) ratios is indispensable for unraveling complex metabolomic drug actions in label-free, whole-cell Matrix-Assisted Laser Desorption/Ionization Mass Spectrometry (MALDI MS) bioassays. In particular, the identification of novel pharmacodynamic biomarkers to determine target engagement, potency and potential polypharmacology of drug-like compounds in high-throughput applications requires robust data interpretation pipelines. Given the large number of mass features in cell-based MALDI MS bioassays, reliable identification of true biological response patterns and their differentiation from any measurement artefacts that may be present is critical. To facilitate the exploration of metabolomic responses in complex MALDI MS datasets, we present a novel software tool, M2ara. Implemented as a user-friendly R-based shiny application, it enables rapid evaluation of Molecular High Content Screening (MHCS) assay data. Furthermore, we introduce the concept of Curve Response Score (CRS) and CRS fingerprints to enable rapid visual inspection and ranking of mass features. In addition, these CRS fingerprints allow direct comparison of cellular effects among different compounds. Beyond cellular assays, our computational framework can also be applied to MALDI MS-based (cell-free) biochemical assays in general.</p><p><strong>Availability and implementation: </strong>The software tool, code and examples are available at https://github.com/CeMOS-Mannheim/M2ara and https://dx.doi.org/10.6084/m9.figshare.25736541.</p><p><strong>Supplementary information: </strong>Supplementary material is available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tiberius: End-to-end deep learning with an HMM for gene prediction. Tiberius:利用 HMM 进行端到端深度学习,实现基因预测。
Pub Date : 2024-11-18 DOI: 10.1093/bioinformatics/btae685
Lars Gabriel, Felix Becker, Katharina J Hoff, Mario Stanke

Motivation: For more than 25 years, learning-based eukaryotic gene predictors were driven by hidden Markov models (HMMs), which were directly inputted a DNA sequence. Recently, Holst et al. demonstrated with their program Helixer that the accuracy of ab initio eukaryotic gene prediction can be improved by combining deep learning layers with a separate HMM postprocessor.

Results: We present Tiberius, a novel deep learning-based ab initio gene predictor that end-to-end integrates convolutional and long short-term memory layers with a differentiable HMM layer. Tiberius uses a custom gene prediction loss and was trained for prediction in mammalian genomes and evaluated on human and two other genomes. It significantly outperforms existing ab initio methods, achieving F1-scores of 62% at gene level for the human genome, compared to 21% for the next best ab initio method. In de novo mode, Tiberius predicts the exon-intron structure of two out of three human genes without error. Remarkably, even Tiberius's ab initio accuracy matches that of BRAKER3, which uses RNA-seq data and a protein database. Tiberius's highly parallelized model is the fastest state-of-the-art gene prediction method, processing the human genome in under 2 hours.

Availability and implementation: https://github.com/Gaius-Augustus/Tiberius.

动机25 年来,基于学习的真核生物基因预测器一直由直接输入 DNA 序列的隐马尔可夫模型(HMM)驱动。最近,Holst 等人利用他们的程序 Helixer 证明,通过将深度学习层与单独的 HMM 后处理器相结合,可以提高自证真核基因预测的准确性:我们介绍了基于深度学习的新型自证基因预测器 Tiberius,该预测器端到端集成了卷积层、长短期记忆层和可微分 HMM 层。Tiberius 使用定制的基因预测损失,针对哺乳动物基因组的预测进行了训练,并在人类和其他两个基因组上进行了评估。它的性能明显优于现有的自创方法,在人类基因组的基因水平上达到了 62% 的 F1 分数,而次好的自创方法只有 21%。在从头模式下,Tiberius 能准确预测三个人类基因中两个基因的外显子-内含子结构。值得注意的是,即使是 Tiberius 的自证准确率也能与使用 RNA-seq 数据和蛋白质数据库的 BRAKER3 相媲美。Tiberius 的高度并行化模型是目前最快的基因预测方法,处理人类基因组的时间不到 2 小时。可用性和实现:https://github.com/Gaius-Augustus/Tiberius。
{"title":"Tiberius: End-to-end deep learning with an HMM for gene prediction.","authors":"Lars Gabriel, Felix Becker, Katharina J Hoff, Mario Stanke","doi":"10.1093/bioinformatics/btae685","DOIUrl":"10.1093/bioinformatics/btae685","url":null,"abstract":"<p><strong>Motivation: </strong>For more than 25 years, learning-based eukaryotic gene predictors were driven by hidden Markov models (HMMs), which were directly inputted a DNA sequence. Recently, Holst et al. demonstrated with their program Helixer that the accuracy of ab initio eukaryotic gene prediction can be improved by combining deep learning layers with a separate HMM postprocessor.</p><p><strong>Results: </strong>We present Tiberius, a novel deep learning-based ab initio gene predictor that end-to-end integrates convolutional and long short-term memory layers with a differentiable HMM layer. Tiberius uses a custom gene prediction loss and was trained for prediction in mammalian genomes and evaluated on human and two other genomes. It significantly outperforms existing ab initio methods, achieving F1-scores of 62% at gene level for the human genome, compared to 21% for the next best ab initio method. In de novo mode, Tiberius predicts the exon-intron structure of two out of three human genes without error. Remarkably, even Tiberius's ab initio accuracy matches that of BRAKER3, which uses RNA-seq data and a protein database. Tiberius's highly parallelized model is the fastest state-of-the-art gene prediction method, processing the human genome in under 2 hours.</p><p><strong>Availability and implementation: </strong>https://github.com/Gaius-Augustus/Tiberius.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STRPsearch: fast detection of structured tandem repeat proteins. STRPsearch:快速检测结构串联重复蛋白。
Pub Date : 2024-11-18 DOI: 10.1093/bioinformatics/btae690
Soroush Mozaffari, Paula Nazarena Arrías, Damiano Clementel, Damiano Piovesan, Carlo Ferrari, Silvio C E Tosatto, Alexander Miguel Monzon

Motivation: Structured Tandem Repeats Proteins (STRPs) constitute a subclass of tandem repeats characterized by repetitive structural motifs. These proteins exhibit distinct secondary structures that form repetitive tertiary arrangements, often resulting in large molecular assemblies. Despite highly variable sequences, STRPs can perform important and diverse biological functions, maintaining a consistent structure with a variable number of repeat units. With the advent of protein structure prediction methods, millions of 3D-models of proteins are now publicly available. However, automatic detection of STRPs remains challenging with current state-of-the-art tools due to their lack of accuracy and long execution times, hindering their application on large datasets. In most cases, manual curation remains the most accurate method for detecting and classifying STRPs, making it impracticable to annotate millions of structures.

Results: We introduce STRPsearch, a novel tool for the rapid identification, classification, and mapping of STRPs. Leveraging manually curated entries from RepeatsDB as the known conformational space of STRPs, STRPsearch employs the latest advances in structural alignment for a fast and accurate detection of repeated structural motifs in proteins, followed by an innovative approach to map units and insertions through the generation of TM-score profiles. STRPsearch is highly scalable, efficiently processing large datasets, and can be applied to both experimental structures and predicted models. Additionally, it demonstrates superior performance compared to existing tools, offering researchers a reliable and comprehensive solution for STRP analysis across diverse proteomes.

Availability and implementation: STRPsearch is coded in Python. All scripts and associated documentation are available from: https://github.com/BioComputingUP/STRPsearch.

Supplementary information: Supplementary data are available at Bioinformatics online.

研究动机结构串联重复蛋白(Structured Tandem Repeats Proteins,STRPs)是串联重复蛋白的一个亚类,其特点是具有重复的结构基调。这些蛋白质表现出独特的二级结构,形成重复的三级排列,通常形成大的分子组合。尽管序列变化很大,但 STRPs 仍能发挥重要而多样的生物功能,通过不同数量的重复单元保持结构的一致性。随着蛋白质结构预测方法的出现,现在已有数百万个蛋白质三维模型可供公开使用。然而,由于缺乏准确性和执行时间长,目前最先进的工具仍然难以自动检测 STRPs,这阻碍了它们在大型数据集上的应用。在大多数情况下,手工整理仍然是检测和分类 STRPs 的最准确方法,这使得对数百万个结构进行注释变得不切实际:我们介绍了 STRPsearch,这是一种用于快速识别、分类和绘制 STRPs 的新型工具。STRPsearch 利用 RepeatsDB 中的人工编辑条目作为 STRPs 的已知构象空间,采用结构比对方面的最新进展,快速准确地检测蛋白质中的重复结构母题,然后通过生成 TM 分数剖面图,以创新方法绘制单元和插入图。STRPsearch 具有很强的可扩展性,能高效处理大型数据集,并可应用于实验结构和预测模型。此外,与现有工具相比,STRPsearch 性能更优越,可为研究人员提供可靠、全面的 STRP 分析解决方案,适用于各种蛋白质组:STRPsearch 是用 Python 编写的。所有脚本和相关文档可从以下网站获取: https://github.com/BioComputingUP/STRPsearch.Supplementary information:补充数据可在 Bioinformatics online 上获取。
{"title":"STRPsearch: fast detection of structured tandem repeat proteins.","authors":"Soroush Mozaffari, Paula Nazarena Arrías, Damiano Clementel, Damiano Piovesan, Carlo Ferrari, Silvio C E Tosatto, Alexander Miguel Monzon","doi":"10.1093/bioinformatics/btae690","DOIUrl":"10.1093/bioinformatics/btae690","url":null,"abstract":"<p><strong>Motivation: </strong>Structured Tandem Repeats Proteins (STRPs) constitute a subclass of tandem repeats characterized by repetitive structural motifs. These proteins exhibit distinct secondary structures that form repetitive tertiary arrangements, often resulting in large molecular assemblies. Despite highly variable sequences, STRPs can perform important and diverse biological functions, maintaining a consistent structure with a variable number of repeat units. With the advent of protein structure prediction methods, millions of 3D-models of proteins are now publicly available. However, automatic detection of STRPs remains challenging with current state-of-the-art tools due to their lack of accuracy and long execution times, hindering their application on large datasets. In most cases, manual curation remains the most accurate method for detecting and classifying STRPs, making it impracticable to annotate millions of structures.</p><p><strong>Results: </strong>We introduce STRPsearch, a novel tool for the rapid identification, classification, and mapping of STRPs. Leveraging manually curated entries from RepeatsDB as the known conformational space of STRPs, STRPsearch employs the latest advances in structural alignment for a fast and accurate detection of repeated structural motifs in proteins, followed by an innovative approach to map units and insertions through the generation of TM-score profiles. STRPsearch is highly scalable, efficiently processing large datasets, and can be applied to both experimental structures and predicted models. Additionally, it demonstrates superior performance compared to existing tools, offering researchers a reliable and comprehensive solution for STRP analysis across diverse proteomes.</p><p><strong>Availability and implementation: </strong>STRPsearch is coded in Python. All scripts and associated documentation are available from: https://github.com/BioComputingUP/STRPsearch.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Damsel: Analysis and visualisation of DamID sequencing in R. Damsel:用 R 进行 DamID 测序的分析和可视化。
Pub Date : 2024-11-18 DOI: 10.1093/bioinformatics/btae695
Caitlin G Page, Andrew Londsdale, Katrina A Mitchell, Jan Schröder, Kieran F Harvey, Alicia Oshlack

Summary: DamID sequencing is a technique to map the genome-wide interaction of a protein with DNA. Damsel is the first Bioconductor package to provide an end to end analysis for DamID sequencing data within R. Damsel performs quantification and testing of significant binding sites along with exploratory and visual analysis. Damsel produces results consistent with previous analysis approaches.

Availability: The R package Damsel is available for install through the Bioconductor project https://bioconductor.org/packages/release/bioc/html/Damsel.html and the code is available on GitHub https://github.com/Oshlack/Damsel/.

Supplementary information: Supplementary data are available at Bioinformatics online.

摘要:DamID 测序是一种绘制蛋白质与 DNA 的全基因组相互作用图谱的技术。Damsel 可对重要的结合位点进行量化和测试,并可进行探索性和可视化分析。Damsel 得出的结果与之前的分析方法一致:R软件包Damsel可通过Bioconductor项目 https://bioconductor.org/packages/release/bioc/html/Damsel.html 安装,代码可在GitHub https://github.com/Oshlack/Damsel/.Supplementary 获取:补充数据可在 Bioinformatics online 上获取。
{"title":"Damsel: Analysis and visualisation of DamID sequencing in R.","authors":"Caitlin G Page, Andrew Londsdale, Katrina A Mitchell, Jan Schröder, Kieran F Harvey, Alicia Oshlack","doi":"10.1093/bioinformatics/btae695","DOIUrl":"10.1093/bioinformatics/btae695","url":null,"abstract":"<p><strong>Summary: </strong>DamID sequencing is a technique to map the genome-wide interaction of a protein with DNA. Damsel is the first Bioconductor package to provide an end to end analysis for DamID sequencing data within R. Damsel performs quantification and testing of significant binding sites along with exploratory and visual analysis. Damsel produces results consistent with previous analysis approaches.</p><p><strong>Availability: </strong>The R package Damsel is available for install through the Bioconductor project https://bioconductor.org/packages/release/bioc/html/Damsel.html and the code is available on GitHub https://github.com/Oshlack/Damsel/.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sensitivities in protein allocation models reveal distribution of metabolic capacity and flux control. 蛋白质分配模型的敏感性揭示了代谢能力和通量控制的分布。
Pub Date : 2024-11-18 DOI: 10.1093/bioinformatics/btae691
Samira van den Bogaard, Pedro A Saa, Tobias B Alter

Motivation: Expanding on constraint-based metabolic models, protein allocation models (PAMs) enhance flux predictions by accounting for protein resource allocation in cellular metabolism. Yet, to this date, there are no dedicated methods for analyzing and understanding the growth-limiting factors in simulated phenotypes in PAMs.

Results: Here, we introduce a systematic framework for identifying the most sensitive enzyme concentrations (sEnz) in PAMs. The framework exploits the primal and dual formulations of these models to derive sensitivity coefficients based on relations between variables, constraints, and the objective function. This approach enhances our understanding of the growth-limiting factors of metabolic phenotypes under specific environmental or genetic conditions. Compared to other traditional methods for calculating sensitivities, sEnz requires substantially less computation time and facilitates more intuitive comparison and analysis of sensitivities. The sensitivities calculated by sEnz cover enzymes, reactions and protein sectors, enabling a holistic overview of the factors influencing metabolism. When applied to an Escherichia coli PAM, sEnz revealed major pathways and enzymes driving overflow metabolism. Overall, sEnz offers a computational efficient framework for understanding PAM predictions and unravelling the factors governing a particular metabolic phenotype.

Availability and implementation: sEnz is implemented in the modular toolbox for the generation and analysis of PAMs in Python (PAModelpy; v.0.0.3.3), available on Pypi (https://pypi.org/project/PAModelpy/). The source code together with all other python scripts and notebooks are available on GitHub (https://github.com/iAMB-RWTH-Aachen/PAModelpy).

Supplementary information: Supplementary data are available at Bioinformatics online.

动因:蛋白质分配模型(PAMs)是基于约束的代谢模型的扩展,它通过考虑细胞代谢中的蛋白质资源分配来增强通量预测。然而,到目前为止,还没有专门的方法来分析和理解 PAMs 模拟表型中的生长限制因素:在此,我们介绍了一个系统框架,用于确定 PAMs 中最敏感的酶浓度(sEnz)。该框架利用了这些模型的基本公式和对偶公式,根据变量、约束条件和目标函数之间的关系推导出敏感系数。这种方法增强了我们对特定环境或遗传条件下代谢表型生长限制因素的理解。与其他计算敏感度的传统方法相比,sEnz 所需的计算时间大大减少,而且便于对敏感度进行更直观的比较和分析。sEnz 计算出的敏感度涵盖酶、反应和蛋白质部门,可对影响新陈代谢的因素进行全面概述。在应用于大肠杆菌 PAM 时,sEnz 揭示了驱动溢出代谢的主要途径和酶。总体而言,sEnz 提供了一个高效的计算框架,可用于理解 PAM 预测,并揭示影响特定代谢表型的因素。源代码以及所有其他 Python 脚本和笔记本可在 GitHub 上获取 (https://github.com/iAMB-RWTH-Aachen/PAModelpy)。补充信息:补充数据可在 Bioinformatics online 上获取。
{"title":"Sensitivities in protein allocation models reveal distribution of metabolic capacity and flux control.","authors":"Samira van den Bogaard, Pedro A Saa, Tobias B Alter","doi":"10.1093/bioinformatics/btae691","DOIUrl":"10.1093/bioinformatics/btae691","url":null,"abstract":"<p><strong>Motivation: </strong>Expanding on constraint-based metabolic models, protein allocation models (PAMs) enhance flux predictions by accounting for protein resource allocation in cellular metabolism. Yet, to this date, there are no dedicated methods for analyzing and understanding the growth-limiting factors in simulated phenotypes in PAMs.</p><p><strong>Results: </strong>Here, we introduce a systematic framework for identifying the most sensitive enzyme concentrations (sEnz) in PAMs. The framework exploits the primal and dual formulations of these models to derive sensitivity coefficients based on relations between variables, constraints, and the objective function. This approach enhances our understanding of the growth-limiting factors of metabolic phenotypes under specific environmental or genetic conditions. Compared to other traditional methods for calculating sensitivities, sEnz requires substantially less computation time and facilitates more intuitive comparison and analysis of sensitivities. The sensitivities calculated by sEnz cover enzymes, reactions and protein sectors, enabling a holistic overview of the factors influencing metabolism. When applied to an Escherichia coli PAM, sEnz revealed major pathways and enzymes driving overflow metabolism. Overall, sEnz offers a computational efficient framework for understanding PAM predictions and unravelling the factors governing a particular metabolic phenotype.</p><p><strong>Availability and implementation: </strong>sEnz is implemented in the modular toolbox for the generation and analysis of PAMs in Python (PAModelpy; v.0.0.3.3), available on Pypi (https://pypi.org/project/PAModelpy/). The source code together with all other python scripts and notebooks are available on GitHub (https://github.com/iAMB-RWTH-Aachen/PAModelpy).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142670104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Facilitating phenotyping from clinical texts: the medkit library. 促进从临床文本中进行表型分析:medkit 库。
Pub Date : 2024-11-15 DOI: 10.1093/bioinformatics/btae681
Antoine Neuraz, Ghislain Vaillant, Camila Arias, Olivier Birot, Kim-Tam Huynh, Thibaut Fabacher, Alice Rogier, Nicolas Garcelon, Ivan Lerner, Bastien Rance, Adrien Coulet

Summary: Phenotyping consists in applying algorithms to identify individuals associated with a specific, potentially complex, trait or condition, typically out of a collection of Electronic Health Records (EHRs). Because a lot of the clinical information of EHRs are lying in texts, phenotyping from text takes an important role in studies that rely on the secondary use of EHRs. However, the heterogeneity and highly specialized aspect of both the content and form of clinical texts makes this task particularly tedious, and is the source of time and cost constraints in observational studies.

Results: To facilitate the development, evaluation and reproducibility of phenotyping pipelines, we developed an open-source Python library named medkit. It enables composing data processing pipelines made of easy-to-reuse software bricks, named medkit operations. In addition to the core of the library, we share the operations and pipelines we already developed and invite the phenotyping community for their reuse and enrichment.

Availability and implementation: medkit is available at https://github.com/medkit-lib/medkit.

Supplementary information: Documentation, examples and tutorials are available at https://medkit-lib.org/.

摘要:表型分析包括应用算法来识别与特定、可能复杂的性状或病症相关的个体,通常是从电子健康记录(EHR)集合中识别出来的。由于电子健康记录中的大量临床信息都是文本信息,因此在依赖电子健康记录二次使用的研究中,从文本中进行表型分析起着重要作用。然而,临床文本的内容和形式都具有异质性和高度专业性,这使得这项工作特别繁琐,也是观察性研究中时间和成本限制的根源:为了促进表型分析管道的开发、评估和可重复性,我们开发了一个名为 medkit 的开源 Python 库。该库由易于重用的软件砖组成,名为 medkit 操作。除了库的核心部分,我们还分享了已经开发的操作和管道,并邀请表型分析社区重用和丰富这些操作和管道:文档、示例和教程请访问 https://medkit-lib.org/。
{"title":"Facilitating phenotyping from clinical texts: the medkit library.","authors":"Antoine Neuraz, Ghislain Vaillant, Camila Arias, Olivier Birot, Kim-Tam Huynh, Thibaut Fabacher, Alice Rogier, Nicolas Garcelon, Ivan Lerner, Bastien Rance, Adrien Coulet","doi":"10.1093/bioinformatics/btae681","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae681","url":null,"abstract":"<p><strong>Summary: </strong>Phenotyping consists in applying algorithms to identify individuals associated with a specific, potentially complex, trait or condition, typically out of a collection of Electronic Health Records (EHRs). Because a lot of the clinical information of EHRs are lying in texts, phenotyping from text takes an important role in studies that rely on the secondary use of EHRs. However, the heterogeneity and highly specialized aspect of both the content and form of clinical texts makes this task particularly tedious, and is the source of time and cost constraints in observational studies.</p><p><strong>Results: </strong>To facilitate the development, evaluation and reproducibility of phenotyping pipelines, we developed an open-source Python library named medkit. It enables composing data processing pipelines made of easy-to-reuse software bricks, named medkit operations. In addition to the core of the library, we share the operations and pipelines we already developed and invite the phenotyping community for their reuse and enrichment.</p><p><strong>Availability and implementation: </strong>medkit is available at https://github.com/medkit-lib/medkit.</p><p><strong>Supplementary information: </strong>Documentation, examples and tutorials are available at https://medkit-lib.org/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142640428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
R3DMCS: a web server for visualizing structural variation in RNA motifs across experimental 3D structures from the same organism or across species. R3DMCS:网络服务器,用于可视化来自同一生物体或不同物种的实验三维结构中的 RNA 主题结构变化。
Pub Date : 2024-11-15 DOI: 10.1093/bioinformatics/btae682
Sri Devan Appasamy, Craig L Zirbel

Motivation: The recent progress in RNA structure determination methods has resulted in a surge of newly solved RNA 3D structures. However, there is an absence of a user-friendly browser-based tool that can facilitate the comparison and visualization of RNA motifs across multiple 3D structures.

Results: We introduce R3DMCS, a web server that allows users to compare selected RNA nucleotides across all 3D structures of a given molecule from a given species, or across all 3D structures mapped to a single Rfam family. Starting from one instance of the motif, R3DMCS retrieves, aligns, annotates, organizes, and displays 3D coordinates of corresponding sets of nucleotides from other 3D structures. With R3DMCS, one can explore conformational changes of motifs due to 3D structures being solved in different functional states or different experimental conditions. One can also investigate conservation of 3D structure across species, or changes in 3D structure due to changes in sequence.

Availability: R3DMCS is open-source software and freely available at  https://rna.bgsu.edu/correspondence/  and  https://github.com/BGSU-RNA/RNA-3D-correspondence  .

Supplementary information: Supplementary data are available at Bioinformatics online.

动机最近在 RNA 结构测定方法方面取得的进展导致新解决的 RNA 三维结构激增。然而,目前还没有一种基于浏览器的用户友好型工具,可以方便地对多个三维结构中的 RNA 主题进行比较和可视化:我们介绍了 R3DMCS,它是一个网络服务器,用户可以通过该服务器比较特定物种特定分子所有三维结构中的选定 RNA 核苷酸,或比较映射到单个 Rfam 家族的所有三维结构中的选定 RNA 核苷酸。R3DMCS 从一个主题实例开始,检索、对齐、注释、组织和显示其他三维结构中相应核苷酸集的三维坐标。利用 R3DMCS,人们可以探索三维结构在不同功能状态或不同实验条件下的构象变化。还可以研究不同物种间三维结构的保持情况,或因序列变化而导致的三维结构变化:R3DMCS 是开源软件,可在 https://rna.bgsu.edu/correspondence/ 和 https://github.com/BGSU-RNA/RNA-3D-correspondence 免费获取:补充数据可在 Bioinformatics online 上获取。
{"title":"R3DMCS: a web server for visualizing structural variation in RNA motifs across experimental 3D structures from the same organism or across species.","authors":"Sri Devan Appasamy, Craig L Zirbel","doi":"10.1093/bioinformatics/btae682","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae682","url":null,"abstract":"<p><strong>Motivation: </strong>The recent progress in RNA structure determination methods has resulted in a surge of newly solved RNA 3D structures. However, there is an absence of a user-friendly browser-based tool that can facilitate the comparison and visualization of RNA motifs across multiple 3D structures.</p><p><strong>Results: </strong>We introduce R3DMCS, a web server that allows users to compare selected RNA nucleotides across all 3D structures of a given molecule from a given species, or across all 3D structures mapped to a single Rfam family. Starting from one instance of the motif, R3DMCS retrieves, aligns, annotates, organizes, and displays 3D coordinates of corresponding sets of nucleotides from other 3D structures. With R3DMCS, one can explore conformational changes of motifs due to 3D structures being solved in different functional states or different experimental conditions. One can also investigate conservation of 3D structure across species, or changes in 3D structure due to changes in sequence.</p><p><strong>Availability: </strong>R3DMCS is open-source software and freely available at  https://rna.bgsu.edu/correspondence/  and  https://github.com/BGSU-RNA/RNA-3D-correspondence  .</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142640464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LmRaC: a functionally extensible tool for LLM interrogation of user experimental results. LmRaC:功能可扩展的 LLM 用户实验结果查询工具。
Pub Date : 2024-11-15 DOI: 10.1093/bioinformatics/btae679
Douglas B Craig, Sorin Drăghici

Motivation: Large Language Models (LLMs) have provided spectacular results across a wide variety of domains. However, persistent concerns about hallucination and fabrication of authoritative sources raise serious issues for their integral use in scientific research. Retrieval-augmented generation (RAG) is a technique for making data and documents, otherwise unavailable during training, available to the LLM for reasoning tasks. In addition to making dynamic and quantitative data available to the LLM, RAG provides the means by which to carefully control and trace source material, thereby ensuring results are accurate, complete and authoritative.

Results: Here we introduce LmRaC, an LLM-based tool capable of answering complex scientific questions in the context of a user's own experimental results. LmRaC allows users to dynamically build domain specific knowledge-bases from PubMed sources (RAGdom). Answers are drawn solely from this RAG with citations to the paragraph level, virtually eliminating any chance of hallucination or fabrication. These answers can then be used to construct an experimental context (RAGexp) that, along with user supplied documents (e.g., design, protocols) and quantitative results, can be used to answer questions about the user's specific experiment. Questions about quantitative experimental data are integral to LmRaC and are supported by a user-defined and functionally extensible REST API server (RAGfun).

Availability and implementation: Detailed documentation for LmRaC along with a sample REST API server for defining user functions can be found at https://github.com/dbcraig/LmRaC. The LmRaC web application image can be pulled from Docker Hub (https://hub.docker.com) as dbcraig/lmrac.

动机大型语言模型(LLMs)在众多领域都取得了令人瞩目的成果。然而,人们对幻觉和伪造权威来源的持续担忧,为其在科学研究中的全面应用提出了严重的问题。检索增强生成(RAG)是一种让 LLM 在推理任务中使用数据和文档的技术,这些数据和文档在训练过程中是不可用的。除了向 LLM 提供动态和定量数据外,RAG 还提供了仔细控制和追踪源材料的方法,从而确保结果的准确性、完整性和权威性:我们在此介绍 LmRaC,这是一种基于 LLM 的工具,能够根据用户自己的实验结果回答复杂的科学问题。LmRaC 允许用户从 PubMed 资源(RAGdom)中动态建立特定领域的知识库。答案完全来自 RAG,引文精确到段落级别,几乎消除了任何幻觉或捏造的可能性。然后,这些答案可用于构建实验上下文(RAGexp),连同用户提供的文档(如设计、协议)和定量结果,可用于回答有关用户特定实验的问题。有关定量实验数据的问题是 LmRaC 不可分割的一部分,由用户定义且功能可扩展的 REST API 服务器(RAGfun)提供支持:有关 LmRaC 的详细文档以及用于定义用户功能的 REST API 服务器示例,请访问 https://github.com/dbcraig/LmRaC。LmRaC 网络应用程序镜像可从 Docker Hub (https://hub.docker.com) 以 dbcraig/lmrac 的形式提取。
{"title":"LmRaC: a functionally extensible tool for LLM interrogation of user experimental results.","authors":"Douglas B Craig, Sorin Drăghici","doi":"10.1093/bioinformatics/btae679","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae679","url":null,"abstract":"<p><strong>Motivation: </strong>Large Language Models (LLMs) have provided spectacular results across a wide variety of domains. However, persistent concerns about hallucination and fabrication of authoritative sources raise serious issues for their integral use in scientific research. Retrieval-augmented generation (RAG) is a technique for making data and documents, otherwise unavailable during training, available to the LLM for reasoning tasks. In addition to making dynamic and quantitative data available to the LLM, RAG provides the means by which to carefully control and trace source material, thereby ensuring results are accurate, complete and authoritative.</p><p><strong>Results: </strong>Here we introduce LmRaC, an LLM-based tool capable of answering complex scientific questions in the context of a user's own experimental results. LmRaC allows users to dynamically build domain specific knowledge-bases from PubMed sources (RAGdom). Answers are drawn solely from this RAG with citations to the paragraph level, virtually eliminating any chance of hallucination or fabrication. These answers can then be used to construct an experimental context (RAGexp) that, along with user supplied documents (e.g., design, protocols) and quantitative results, can be used to answer questions about the user's specific experiment. Questions about quantitative experimental data are integral to LmRaC and are supported by a user-defined and functionally extensible REST API server (RAGfun).</p><p><strong>Availability and implementation: </strong>Detailed documentation for LmRaC along with a sample REST API server for defining user functions can be found at https://github.com/dbcraig/LmRaC. The LmRaC web application image can be pulled from Docker Hub (https://hub.docker.com) as dbcraig/lmrac.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142640460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing Multi-Omics Data Imputation with NMF and GAN Synergy. 利用 NMF 和 GAN 协同作用优化多指标数据推算。
Pub Date : 2024-11-15 DOI: 10.1093/bioinformatics/btae674
Md Istiaq Ansari, Khandakar Tanvir Ahmed, Wei Zhang

Motivation: Integrating multiple omics datasets can significantly advance our understanding of disease mechanisms, physiology, and treatment responses. However, a major challenge in multi-omics studies is the disparity in sample sizes across different datasets, which can introduce bias and reduce statistical power. To address this issue, we propose a novel framework, OmicsNMF, designed to impute missing omics data and enhance disease phenotype prediction. OmicsNMF integrates Generative Adversarial Networks (GANs) with Non-Negative Matrix Factorization (NMF). NMF is a well-established method for uncovering underlying patterns in omics data, while GANs enhance the imputation process by generating realistic data samples. This synergy aims to more effectively address sample size disparity, thereby improving data integration and prediction accuracy.

Results: For evaluation, we focused on predicting breast cancer subtypes using the imputed data generated by our proposed framework, OmicsNMF. Our results indicate that OmicsNMF consistently outperforms baseline methods. We further assessed the quality of the imputed data through survival analysis, revealing that the imputed omics profiles provide significant prognostic power for both overall survival and disease-free status. Overall, OmicsNMF effectively leverages GANs and NMF to impute missing samples while preserving key biological features. This approach shows potential for advancing precision oncology by improving data integration and analysis.

Availability and implementation: Source code is available at: https://github.com/compbiolabucf/OmicsNMF.

动机整合多个组学数据集能极大地促进我们对疾病机制、生理学和治疗反应的理解。然而,多组学研究的一个主要挑战是不同数据集之间样本量的差异,这会带来偏差并降低统计能力。为了解决这个问题,我们提出了一个新颖的框架--OmicsNMF,旨在弥补缺失的组学数据并增强疾病表型预测。OmicsNMF 将生成对抗网络(GAN)与非负矩阵因式分解(NMF)相结合。非负矩阵因式分解(NMF)是一种行之有效的方法,用于揭示 omics 数据中的潜在模式,而 GANs 则通过生成真实的数据样本来增强估算过程。这种协同作用旨在更有效地解决样本大小差异问题,从而提高数据整合和预测准确性:为了进行评估,我们重点使用我们提出的框架 OmicsNMF 生成的估算数据预测乳腺癌亚型。结果表明,OmicsNMF 始终优于基准方法。我们通过生存分析进一步评估了估算数据的质量,结果显示估算的多组学特征为总生存期和无病状态提供了显著的预后能力。总之,OmicsNMF 有效地利用了 GANs 和 NMF 来估算缺失样本,同时保留了关键的生物学特征。这种方法通过改进数据整合和分析,显示出推进精准肿瘤学的潜力:源代码可在以下网址获取:https://github.com/compbiolabucf/OmicsNMF.
{"title":"Optimizing Multi-Omics Data Imputation with NMF and GAN Synergy.","authors":"Md Istiaq Ansari, Khandakar Tanvir Ahmed, Wei Zhang","doi":"10.1093/bioinformatics/btae674","DOIUrl":"https://doi.org/10.1093/bioinformatics/btae674","url":null,"abstract":"<p><strong>Motivation: </strong>Integrating multiple omics datasets can significantly advance our understanding of disease mechanisms, physiology, and treatment responses. However, a major challenge in multi-omics studies is the disparity in sample sizes across different datasets, which can introduce bias and reduce statistical power. To address this issue, we propose a novel framework, OmicsNMF, designed to impute missing omics data and enhance disease phenotype prediction. OmicsNMF integrates Generative Adversarial Networks (GANs) with Non-Negative Matrix Factorization (NMF). NMF is a well-established method for uncovering underlying patterns in omics data, while GANs enhance the imputation process by generating realistic data samples. This synergy aims to more effectively address sample size disparity, thereby improving data integration and prediction accuracy.</p><p><strong>Results: </strong>For evaluation, we focused on predicting breast cancer subtypes using the imputed data generated by our proposed framework, OmicsNMF. Our results indicate that OmicsNMF consistently outperforms baseline methods. We further assessed the quality of the imputed data through survival analysis, revealing that the imputed omics profiles provide significant prognostic power for both overall survival and disease-free status. Overall, OmicsNMF effectively leverages GANs and NMF to impute missing samples while preserving key biological features. This approach shows potential for advancing precision oncology by improving data integration and analysis.</p><p><strong>Availability and implementation: </strong>Source code is available at: https://github.com/compbiolabucf/OmicsNMF.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142640461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics (Oxford, England)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1