首页 > 最新文献

Briefings in bioinformatics最新文献

英文 中文
Whole-genome bisulfite sequencing data analysis learning module on Google Cloud Platform. 谷歌云平台上的全基因组亚硫酸氢盐测序数据分析学习模块。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-23 DOI: 10.1093/bib/bbae236
Yujia Qin, Angela Maggio, Dale Hawkins, Laura Beaudry, Allen Kim, Daniel Pan, Ting Gong, Yuanyuan Fu, Hua Yang, Youping Deng

This study describes the development of a resource module that is part of a learning platform named 'NIGMS Sandbox for Cloud-based Learning' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module is designed to facilitate interactive learning of whole-genome bisulfite sequencing (WGBS) data analysis utilizing cloud-based tools in Google Cloud Platform, such as Cloud Storage, Vertex AI notebooks and Google Batch. WGBS is a powerful technique that can provide comprehensive insights into DNA methylation patterns at single cytosine resolution, essential for understanding epigenetic regulation across the genome. The designed learning module first provides step-by-step tutorials that guide learners through two main stages of WGBS data analysis, preprocessing and the identification of differentially methylated regions. And then, it provides a streamlined workflow and demonstrates how to effectively use it for large datasets given the power of cloud infrastructure. The integration of these interconnected submodules progressively deepens the user's understanding of the WGBS analysis process along with the use of cloud resources. Through this module, we can enhance the accessibility and adoption of cloud computing in epigenomic research, speeding up the advancements in the related field and beyond. This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.

本研究介绍了一个资源模块的开发情况,该模块是名为 "NIGMS 云学习沙盒 "的学习平台 https://github.com/NIGMS/NIGMS-Sandbox 的一部分。本增刊开头的社论 "NIGMS 沙盒 "介绍了沙盒的总体起源。该模块旨在促进全基因组亚硫酸氢盐测序(WGBS)数据分析的互动学习,利用谷歌云平台中基于云的工具,如云存储、顶点人工智能笔记本和谷歌批处理。WGBS 是一种功能强大的技术,可以全面了解单胞嘧啶分辨率的 DNA 甲基化模式,对于了解整个基因组的表观遗传调控至关重要。设计的学习模块首先提供循序渐进的教程,指导学习者完成 WGBS 数据分析的两个主要阶段,即预处理和识别差异甲基化区域。然后,它提供了一个简化的工作流程,并演示了如何利用云基础设施的强大功能有效地将其用于大型数据集。这些相互关联的子模块的整合逐步加深了用户对 WGBS 分析流程和云资源使用的理解。通过这个模块,我们可以提高云计算在表观基因组研究中的可及性和采用率,加快相关领域及其他领域的进步。本手稿介绍了一个资源模块的开发过程,该模块是名为 "NIGMS 云学习沙盒 "的学习平台 https://github.com/NIGMS/NIGMS-Sandbox 的一部分。本补编开头的社论《NIGMS 沙盒》[1] 介绍了沙盒的整体起源。该模块以交互式格式提供有关批量和单细胞 ATAC-seq 数据分析的学习材料,并使用适当的云资源进行数据访问和分析。
{"title":"Whole-genome bisulfite sequencing data analysis learning module on Google Cloud Platform.","authors":"Yujia Qin, Angela Maggio, Dale Hawkins, Laura Beaudry, Allen Kim, Daniel Pan, Ting Gong, Yuanyuan Fu, Hua Yang, Youping Deng","doi":"10.1093/bib/bbae236","DOIUrl":"10.1093/bib/bbae236","url":null,"abstract":"<p><p>This study describes the development of a resource module that is part of a learning platform named 'NIGMS Sandbox for Cloud-based Learning' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module is designed to facilitate interactive learning of whole-genome bisulfite sequencing (WGBS) data analysis utilizing cloud-based tools in Google Cloud Platform, such as Cloud Storage, Vertex AI notebooks and Google Batch. WGBS is a powerful technique that can provide comprehensive insights into DNA methylation patterns at single cytosine resolution, essential for understanding epigenetic regulation across the genome. The designed learning module first provides step-by-step tutorials that guide learners through two main stages of WGBS data analysis, preprocessing and the identification of differentially methylated regions. And then, it provides a streamlined workflow and demonstrates how to effectively use it for large datasets given the power of cloud infrastructure. The integration of these interconnected submodules progressively deepens the user's understanding of the WGBS analysis process along with the use of cloud resources. Through this module, we can enhance the accessibility and adoption of cloud computing in epigenomic research, speeding up the advancements in the related field and beyond. This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11264297/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141747496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CCPA: cloud-based, self-learning modules for consensus pathway analysis using GO, KEGG and Reactome. CCPA:使用 GO、KEGG 和 Reactome 进行共识通路分析的基于云的自学模块。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-23 DOI: 10.1093/bib/bbae222
Ha Nguyen, Van-Dung Pham, Hung Nguyen, Bang Tran, Juli Petereit, Tin Nguyen

This manuscript describes the development of a resource module that is part of a learning platform named 'NIGMS Sandbox for Cloud-based Learning' (https://github.com/NIGMS/NIGMS-Sandbox). The module delivers learning materials on Cloud-based Consensus Pathway Analysis in an interactive format that uses appropriate cloud resources for data access and analyses. Pathway analysis is important because it allows us to gain insights into biological mechanisms underlying conditions. But the availability of many pathway analysis methods, the requirement of coding skills, and the focus of current tools on only a few species all make it very difficult for biomedical researchers to self-learn and perform pathway analysis efficiently. Furthermore, there is a lack of tools that allow researchers to compare analysis results obtained from different experiments and different analysis methods to find consensus results. To address these challenges, we have designed a cloud-based, self-learning module that provides consensus results among established, state-of-the-art pathway analysis techniques to provide students and researchers with necessary training and example materials. The training module consists of five Jupyter Notebooks that provide complete tutorials for the following tasks: (i) process expression data, (ii) perform differential analysis, visualize and compare the results obtained from four differential analysis methods (limma, t-test, edgeR, DESeq2), (iii) process three pathway databases (GO, KEGG and Reactome), (iv) perform pathway analysis using eight methods (ORA, CAMERA, KS test, Wilcoxon test, FGSEA, GSA, SAFE and PADOG) and (v) combine results of multiple analyses. We also provide examples, source code, explanations and instructional videos for trainees to complete each Jupyter Notebook. The module supports the analysis for many model (e.g. human, mouse, fruit fly, zebra fish) and non-model species. The module is publicly available at https://github.com/NIGMS/Consensus-Pathway-Analysis-in-the-Cloud. This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.

本手稿介绍了一个资源模块的开发过程,该模块是名为 "NIGMS 云学习沙盒"(https://github.com/NIGMS/NIGMS-Sandbox) 的学习平台的一部分。该模块以互动形式提供基于云的共识通路分析学习材料,并使用适当的云资源进行数据访问和分析。通路分析非常重要,因为它能让我们深入了解疾病背后的生物机制。但是,由于存在许多通路分析方法、对编码技能的要求以及目前的工具只关注少数物种,因此生物医学研究人员很难自学并高效地进行通路分析。此外,目前还缺乏能让研究人员对不同实验和不同分析方法得出的分析结果进行比较,从而找到一致结果的工具。为了应对这些挑战,我们设计了一个基于云计算的自学模块,该模块可提供既有的、最先进的通路分析技术之间的共识结果,为学生和研究人员提供必要的培训和示例材料。培训模块由五个 Jupyter 笔记本组成,为以下任务提供完整的教程:(i) 处理表达数据;(ii) 执行差异分析,可视化并比较四种差异分析方法(limma、t 检验、edgeR、DESeq2)得出的结果;(iii) 处理三个通路数据库(GO、KEGG 和 Reactome);(iv) 使用八种方法(ORA、CAMERA、KS 检验、Wilcoxon 检验、FGSEA、GSA、SAFE 和 PADOG)执行通路分析;(v) 合并多种分析结果。我们还提供了示例、源代码、解释和教学视频,供学员完成每个 Jupyter 笔记本。该模块支持对许多模式物种(如人类、小鼠、果蝇、斑马鱼)和非模式物种进行分析。该模块可通过 https://github.com/NIGMS/Consensus-Pathway-Analysis-in-the-Cloud 公开获取。本手稿介绍了资源模块的开发情况,该模块是名为 "NIGMS 云学习沙盒 "的学习平台 https://github.com/NIGMS/NIGMS-Sandbox 的一部分。本补编开头的社论《NIGMS 沙盒》[1] 介绍了沙盒的总体起源。该模块以交互式格式提供有关批量和单细胞 ATAC-seq 数据分析的学习材料,并使用适当的云资源进行数据访问和分析。
{"title":"CCPA: cloud-based, self-learning modules for consensus pathway analysis using GO, KEGG and Reactome.","authors":"Ha Nguyen, Van-Dung Pham, Hung Nguyen, Bang Tran, Juli Petereit, Tin Nguyen","doi":"10.1093/bib/bbae222","DOIUrl":"10.1093/bib/bbae222","url":null,"abstract":"<p><p>This manuscript describes the development of a resource module that is part of a learning platform named 'NIGMS Sandbox for Cloud-based Learning' (https://github.com/NIGMS/NIGMS-Sandbox). The module delivers learning materials on Cloud-based Consensus Pathway Analysis in an interactive format that uses appropriate cloud resources for data access and analyses. Pathway analysis is important because it allows us to gain insights into biological mechanisms underlying conditions. But the availability of many pathway analysis methods, the requirement of coding skills, and the focus of current tools on only a few species all make it very difficult for biomedical researchers to self-learn and perform pathway analysis efficiently. Furthermore, there is a lack of tools that allow researchers to compare analysis results obtained from different experiments and different analysis methods to find consensus results. To address these challenges, we have designed a cloud-based, self-learning module that provides consensus results among established, state-of-the-art pathway analysis techniques to provide students and researchers with necessary training and example materials. The training module consists of five Jupyter Notebooks that provide complete tutorials for the following tasks: (i) process expression data, (ii) perform differential analysis, visualize and compare the results obtained from four differential analysis methods (limma, t-test, edgeR, DESeq2), (iii) process three pathway databases (GO, KEGG and Reactome), (iv) perform pathway analysis using eight methods (ORA, CAMERA, KS test, Wilcoxon test, FGSEA, GSA, SAFE and PADOG) and (v) combine results of multiple analyses. We also provide examples, source code, explanations and instructional videos for trainees to complete each Jupyter Notebook. The module supports the analysis for many model (e.g. human, mouse, fruit fly, zebra fish) and non-model species. The module is publicly available at https://github.com/NIGMS/Consensus-Pathway-Analysis-in-the-Cloud. This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11264295/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141747463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CloudATAC: a cloud-based framework for ATAC-Seq data analysis. CloudATAC:基于云的 ATAC-Seq 数据分析框架。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-07-23 DOI: 10.1093/bib/bbae090
Avinash M Veerappa, M Jordan Rowley, Angela Maggio, Laura Beaudry, Dale Hawkins, Allen Kim, Sahil Sethi, Paul L Sorgen, Chittibabu Guda

Assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) generates genome-wide chromatin accessibility profiles, providing valuable insights into epigenetic gene regulation at both pooled-cell and single-cell population levels. Comprehensive analysis of ATAC-seq data involves the use of various interdependent programs. Learning the correct sequence of steps needed to process the data can represent a major hurdle. Selecting appropriate parameters at each stage, including pre-analysis, core analysis, and advanced downstream analysis, is important to ensure accurate analysis and interpretation of ATAC-seq data. Additionally, obtaining and working within a limited computational environment presents a significant challenge to non-bioinformatic researchers. Therefore, we present Cloud ATAC, an open-source, cloud-based interactive framework with a scalable, flexible, and streamlined analysis framework based on the best practices approach for pooled-cell and single-cell ATAC-seq data. These frameworks use on-demand computational power and memory, scalability, and a secure and compliant environment provided by the Google Cloud. Additionally, we leverage Jupyter Notebook's interactive computing platform that combines live code, tutorials, narrative text, flashcards, quizzes, and custom visualizations to enhance learning and analysis. Further, leveraging GPU instances has significantly improved the run-time of the single-cell framework. The source codes and data are publicly available through NIH Cloud lab https://github.com/NIGMS/ATAC-Seq-and-Single-Cell-ATAC-Seq-Analysis. This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.

利用高通量测序(ATAC-seq)检测转座酶可及染色质可生成全基因组染色质可及性图谱,为在集合细胞和单细胞群体水平上了解表观遗传基因调控提供了宝贵的信息。ATAC-seq 数据的综合分析需要使用各种相互依存的程序。学习处理数据所需的正确步骤顺序是一大障碍。在每个阶段(包括预分析、核心分析和高级下游分析)选择适当的参数对于确保准确分析和解读 ATAC-seq 数据非常重要。此外,获得有限的计算环境并在其中工作对非生物信息学研究人员来说也是一个巨大的挑战。因此,我们提出了云 ATAC,这是一个开源、基于云的交互式框架,具有可扩展、灵活、精简的分析框架,基于池细胞和单细胞 ATAC-seq 数据的最佳实践方法。这些框架使用按需计算能力和内存、可扩展性以及谷歌云提供的安全、合规的环境。此外,我们还利用 Jupyter Notebook 的交互式计算平台,该平台结合了实时代码、教程、叙述性文本、闪存卡、测验和自定义可视化,以加强学习和分析。此外,利用 GPU 实例大大提高了单细胞框架的运行时间。源代码和数据可通过美国国立卫生研究院云实验室 https://github.com/NIGMS/ATAC-Seq-and-Single-Cell-ATAC-Seq-Analysis 公开获取。本手稿介绍了资源模块的开发情况,该模块是名为 "NIGMS 云学习沙盒 "的学习平台的一部分,https://github.com/NIGMS/NIGMS-Sandbox。本补编开头的社论 "NIGMS 沙盒"[1] 介绍了沙盒的整体起源。该模块以交互式格式提供有关批量和单细胞 ATAC-seq 数据分析的学习材料,并使用适当的云资源进行数据访问和分析。
{"title":"CloudATAC: a cloud-based framework for ATAC-Seq data analysis.","authors":"Avinash M Veerappa, M Jordan Rowley, Angela Maggio, Laura Beaudry, Dale Hawkins, Allen Kim, Sahil Sethi, Paul L Sorgen, Chittibabu Guda","doi":"10.1093/bib/bbae090","DOIUrl":"10.1093/bib/bbae090","url":null,"abstract":"<p><p>Assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) generates genome-wide chromatin accessibility profiles, providing valuable insights into epigenetic gene regulation at both pooled-cell and single-cell population levels. Comprehensive analysis of ATAC-seq data involves the use of various interdependent programs. Learning the correct sequence of steps needed to process the data can represent a major hurdle. Selecting appropriate parameters at each stage, including pre-analysis, core analysis, and advanced downstream analysis, is important to ensure accurate analysis and interpretation of ATAC-seq data. Additionally, obtaining and working within a limited computational environment presents a significant challenge to non-bioinformatic researchers. Therefore, we present Cloud ATAC, an open-source, cloud-based interactive framework with a scalable, flexible, and streamlined analysis framework based on the best practices approach for pooled-cell and single-cell ATAC-seq data. These frameworks use on-demand computational power and memory, scalability, and a secure and compliant environment provided by the Google Cloud. Additionally, we leverage Jupyter Notebook's interactive computing platform that combines live code, tutorials, narrative text, flashcards, quizzes, and custom visualizations to enhance learning and analysis. Further, leveraging GPU instances has significantly improved the run-time of the single-cell framework. The source codes and data are publicly available through NIH Cloud lab https://github.com/NIGMS/ATAC-Seq-and-Single-Cell-ATAC-Seq-Analysis. This manuscript describes the development of a resource module that is part of a learning platform named ``NIGMS Sandbox for Cloud-based Learning'' https://github.com/NIGMS/NIGMS-Sandbox. The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox [1] at the beginning of this Supplement. This module delivers learning materials on the analysis of bulk and single-cell ATAC-seq data in an interactive format that uses appropriate cloud resources for data access and analyses.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11264300/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141747464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RTCpredictor: identification of read-through chimeric RNAs from RNA sequencing data RTCpredictor:从 RNA 测序数据中识别通读嵌合 RNA
IF 9.5 2区 生物学 Q1 Computer Science Pub Date : 2024-05-26 DOI: 10.1093/bib/bbae251
Sandeep Singh, Xinrui Shi, Samuel Haddox, Justin Elfman, Syed Basil Ahmad, Sarah Lynch, Tommy Manley, Claire Piczak, Christopher Phung, Yunan Sun, Aadi Sharma, Hui Li
Read-through chimeric RNAs are being recognized as a means to expand the functional transcriptome and contribute to cancer tumorigenesis when mis-regulated. However, current software tools often fail to predict them. We have developed RTCpredictor, utilizing a fast ripgrep tool to search for all possible exon-exon combinations of parental gene pairs. We also added exonic variants allowing searches containing common SNPs. To our knowledge, it is the first read-through chimeric RNA specific prediction method that also provides breakpoint coordinates. Compared with 10 other popular tools, RTCpredictor achieved high sensitivity on a simulated and three real datasets. In addition, RTCpredictor has less memory requirements and faster execution time, making it ideal for applying on large datasets.
通读嵌合 RNA 被认为是扩大功能转录组的一种手段,一旦被错误调控,就会导致癌症肿瘤的发生。然而,目前的软件工具往往无法预测它们。我们开发了 RTCpredictor,利用快速 ripgrep 工具搜索亲代基因对的所有可能外显子-外显子组合。我们还增加了外显子变异,允许搜索包含常见 SNP 的基因。据我们所知,这是第一种同时提供断点坐标的通读嵌合 RNA 特异预测方法。与其他 10 种流行工具相比,RTCpredictor 在一个模拟数据集和三个真实数据集上实现了高灵敏度。此外,RTCpredictor 对内存的要求更低,执行时间更快,因此非常适合应用于大型数据集。
{"title":"RTCpredictor: identification of read-through chimeric RNAs from RNA sequencing data","authors":"Sandeep Singh, Xinrui Shi, Samuel Haddox, Justin Elfman, Syed Basil Ahmad, Sarah Lynch, Tommy Manley, Claire Piczak, Christopher Phung, Yunan Sun, Aadi Sharma, Hui Li","doi":"10.1093/bib/bbae251","DOIUrl":"https://doi.org/10.1093/bib/bbae251","url":null,"abstract":"Read-through chimeric RNAs are being recognized as a means to expand the functional transcriptome and contribute to cancer tumorigenesis when mis-regulated. However, current software tools often fail to predict them. We have developed RTCpredictor, utilizing a fast ripgrep tool to search for all possible exon-exon combinations of parental gene pairs. We also added exonic variants allowing searches containing common SNPs. To our knowledge, it is the first read-through chimeric RNA specific prediction method that also provides breakpoint coordinates. Compared with 10 other popular tools, RTCpredictor achieved high sensitivity on a simulated and three real datasets. In addition, RTCpredictor has less memory requirements and faster execution time, making it ideal for applying on large datasets.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":9.5,"publicationDate":"2024-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141152691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking mapping algorithms for cell-type annotating in mouse brain by integrating single-nucleus RNA-seq and Stereo-seq data 通过整合单核 RNA-seq 和 Stereo-seq 数据,为小鼠大脑中细胞类型注释的绘图算法制定基准
IF 9.5 2区 生物学 Q1 Computer Science Pub Date : 2024-05-26 DOI: 10.1093/bib/bbae250
Quyuan Tao, Yiheng Xu, Youzhe He, Ting Luo, Xiaoming Li, Lei Han
Limited gene capture efficiency and spot size of spatial transcriptome (ST) data pose significant challenges in cell-type characterization. The heterogeneity and complexity of cell composition in the mammalian brain make it more challenging to accurately annotate ST data from brain. Many algorithms attempt to characterize subtypes of neuron by integrating ST data with single-nucleus RNA sequencing (snRNA-seq) or single-cell RNA sequencing. However, assessing the accuracy of these algorithms on Stereo-seq ST data remains unresolved. Here, we benchmarked 9 mapping algorithms using 10 ST datasets from four mouse brain regions in two different resolutions and 24 pseudo-ST datasets from snRNA-seq. Both actual ST data and pseudo-ST data were mapped using snRNA-seq datasets from the corresponding brain regions as reference data. After comparing the performance across different areas and resolutions of the mouse brain, we have reached the conclusion that both robust cell-type decomposition and SpatialDWLS demonstrated superior robustness and accuracy in cell-type annotation. Testing with publicly available snRNA-seq data from another sequencing platform in the cortex region further validated our conclusions. Altogether, we developed a workflow for assessing suitability of mapping algorithm that fits for ST datasets, which can improve the efficiency and accuracy of spatial data annotation.
空间转录组(ST)数据有限的基因捕获效率和斑点大小给细胞类型鉴定带来了巨大挑战。哺乳动物大脑中细胞组成的异质性和复杂性使准确注释大脑中的 ST 数据更具挑战性。许多算法试图通过整合 ST 数据与单核 RNA 测序(snRNA-seq)或单细胞 RNA 测序来表征神经元亚型。然而,评估这些算法在立体测序 ST 数据上的准确性仍是一个悬而未决的问题。在这里,我们使用来自四个小鼠脑区的两种不同分辨率的 10 个 ST 数据集和来自 snRNA-seq 的 24 个伪 ST 数据集对 9 种绘图算法进行了基准测试。实际 ST 数据和伪 ST 数据都是以相应脑区的 snRNA-seq 数据集为参考数据绘制的。在比较了小鼠大脑不同区域和分辨率的性能后,我们得出结论:稳健细胞类型分解和空间DWLS在细胞类型注释方面都表现出了卓越的稳健性和准确性。用另一个测序平台在大脑皮层区域公开提供的 snRNA-seq 数据进行的测试进一步验证了我们的结论。总之,我们开发了一种评估适合 ST 数据集的绘图算法的工作流程,它可以提高空间数据注释的效率和准确性。
{"title":"Benchmarking mapping algorithms for cell-type annotating in mouse brain by integrating single-nucleus RNA-seq and Stereo-seq data","authors":"Quyuan Tao, Yiheng Xu, Youzhe He, Ting Luo, Xiaoming Li, Lei Han","doi":"10.1093/bib/bbae250","DOIUrl":"https://doi.org/10.1093/bib/bbae250","url":null,"abstract":"Limited gene capture efficiency and spot size of spatial transcriptome (ST) data pose significant challenges in cell-type characterization. The heterogeneity and complexity of cell composition in the mammalian brain make it more challenging to accurately annotate ST data from brain. Many algorithms attempt to characterize subtypes of neuron by integrating ST data with single-nucleus RNA sequencing (snRNA-seq) or single-cell RNA sequencing. However, assessing the accuracy of these algorithms on Stereo-seq ST data remains unresolved. Here, we benchmarked 9 mapping algorithms using 10 ST datasets from four mouse brain regions in two different resolutions and 24 pseudo-ST datasets from snRNA-seq. Both actual ST data and pseudo-ST data were mapped using snRNA-seq datasets from the corresponding brain regions as reference data. After comparing the performance across different areas and resolutions of the mouse brain, we have reached the conclusion that both robust cell-type decomposition and SpatialDWLS demonstrated superior robustness and accuracy in cell-type annotation. Testing with publicly available snRNA-seq data from another sequencing platform in the cortex region further validated our conclusions. Altogether, we developed a workflow for assessing suitability of mapping algorithm that fits for ST datasets, which can improve the efficiency and accuracy of spatial data annotation.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":9.5,"publicationDate":"2024-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141152688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GLDADec: marker-gene guided LDA modeling for bulk gene expression deconvolution. GLDADec:用于批量基因表达解卷积的标记基因引导 LDA 建模。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-05-23 DOI: 10.1093/bib/bbae315
Iori Azuma, Tadahaya Mizuno, Hiroyuki Kusuhara

Inferring cell type proportions from bulk transcriptome data is crucial in immunology and oncology. Here, we introduce guided LDA deconvolution (GLDADec), a bulk deconvolution method that guides topics using cell type-specific marker gene names to estimate topic distributions for each sample. Through benchmarking using blood-derived datasets, we demonstrate its high estimation performance and robustness. Moreover, we apply GLDADec to heterogeneous tissue bulk data and perform comprehensive cell type analysis in a data-driven manner. We show that GLDADec outperforms existing methods in estimation performance and evaluate its biological interpretability by examining enrichment of biological processes for topics. Finally, we apply GLDADec to The Cancer Genome Atlas tumor samples, enabling subtype stratification and survival analysis based on estimated cell type proportions, thus proving its practical utility in clinical settings. This approach, utilizing marker gene names as partial prior information, can be applied to various scenarios for bulk data deconvolution. GLDADec is available as an open-source Python package at https://github.com/mizuno-group/GLDADec.

从大量转录组数据中推断细胞类型比例在免疫学和肿瘤学中至关重要。在这里,我们介绍了引导的 LDA 去卷积(GLDADec),这是一种批量去卷积方法,它利用细胞类型特异性标记基因名称引导主题,以估计每个样本的主题分布。通过使用血液衍生数据集进行基准测试,我们证明了该方法的高估算性能和鲁棒性。此外,我们还将 GLDADec 应用于异构组织批量数据,并以数据驱动的方式进行全面的细胞类型分析。我们证明了 GLDADec 在估算性能方面优于现有方法,并通过检查主题生物过程的富集情况评估了其生物学可解释性。最后,我们将 GLDADec 应用于癌症基因组图谱肿瘤样本,根据估计的细胞类型比例进行亚型分层和生存分析,从而证明了它在临床环境中的实用性。这种利用标记基因名称作为部分先验信息的方法可应用于各种情况下的批量数据解卷积。GLDADec 是一个开源 Python 软件包,可从 https://github.com/mizuno-group/GLDADec 获取。
{"title":"GLDADec: marker-gene guided LDA modeling for bulk gene expression deconvolution.","authors":"Iori Azuma, Tadahaya Mizuno, Hiroyuki Kusuhara","doi":"10.1093/bib/bbae315","DOIUrl":"10.1093/bib/bbae315","url":null,"abstract":"<p><p>Inferring cell type proportions from bulk transcriptome data is crucial in immunology and oncology. Here, we introduce guided LDA deconvolution (GLDADec), a bulk deconvolution method that guides topics using cell type-specific marker gene names to estimate topic distributions for each sample. Through benchmarking using blood-derived datasets, we demonstrate its high estimation performance and robustness. Moreover, we apply GLDADec to heterogeneous tissue bulk data and perform comprehensive cell type analysis in a data-driven manner. We show that GLDADec outperforms existing methods in estimation performance and evaluate its biological interpretability by examining enrichment of biological processes for topics. Finally, we apply GLDADec to The Cancer Genome Atlas tumor samples, enabling subtype stratification and survival analysis based on estimated cell type proportions, thus proving its practical utility in clinical settings. This approach, utilizing marker gene names as partial prior information, can be applied to various scenarios for bulk data deconvolution. GLDADec is available as an open-source Python package at https://github.com/mizuno-group/GLDADec.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11233176/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141562711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-modal domain adaptation for revealing spatial functional landscape from spatially resolved transcriptomics. 从空间解析转录组学中揭示空间功能景观的多模态域适应。
IF 9.5 2区 生物学 Q1 Computer Science Pub Date : 2024-05-23 DOI: 10.1093/bib/bbae257
Lequn Wang, Yaofeng Hu, Kai Xiao, Chuanchao Zhang, Qianqian Shi, Luonan Chen

Spatially resolved transcriptomics (SRT) has emerged as a powerful tool for investigating gene expression in spatial contexts, providing insights into the molecular mechanisms underlying organ development and disease pathology. However, the expression sparsity poses a computational challenge to integrate other modalities (e.g. histological images and spatial locations) that are simultaneously captured in SRT datasets for spatial clustering and variation analyses. In this study, to meet such a challenge, we propose multi-modal domain adaption for spatial transcriptomics (stMDA), a novel multi-modal unsupervised domain adaptation method, which integrates gene expression and other modalities to reveal the spatial functional landscape. Specifically, stMDA first learns the modality-specific representations from spatial multi-modal data using multiple neural network architectures and then aligns the spatial distributions across modal representations to integrate these multi-modal representations, thus facilitating the integration of global and spatially local information and improving the consistency of clustering assignments. Our results demonstrate that stMDA outperforms existing methods in identifying spatial domains across diverse platforms and species. Furthermore, stMDA excels in identifying spatially variable genes with high prognostic potential in cancer tissues. In conclusion, stMDA as a new tool of multi-modal data integration provides a powerful and flexible framework for analyzing SRT datasets, thereby advancing our understanding of intricate biological systems.

空间分辨转录组学(SRT)已成为研究空间背景下基因表达的有力工具,可帮助人们深入了解器官发育和疾病病理的分子机制。然而,表达的稀疏性给整合 SRT 数据集中同时捕获的其他模式(如组织学图像和空间位置)以进行空间聚类和变异分析带来了计算上的挑战。在本研究中,为了应对这一挑战,我们提出了空间转录组学多模态域自适应方法(stMDA),这是一种新颖的多模态无监督域自适应方法,它整合了基因表达和其他模态,以揭示空间功能景观。具体来说,stMDA 首先利用多种神经网络架构从空间多模态数据中学习特定模态表征,然后对各模态表征的空间分布进行对齐,以整合这些多模态表征,从而促进全局和空间局部信息的整合,提高聚类分配的一致性。我们的研究结果表明,stMDA 在识别不同平台和物种的空间域方面优于现有方法。此外,stMDA 在识别癌症组织中具有高预后潜力的空间可变基因方面表现出色。总之,stMDA 作为一种新的多模态数据整合工具,为分析 SRT 数据集提供了一个强大而灵活的框架,从而促进了我们对错综复杂的生物系统的理解。
{"title":"Multi-modal domain adaptation for revealing spatial functional landscape from spatially resolved transcriptomics.","authors":"Lequn Wang, Yaofeng Hu, Kai Xiao, Chuanchao Zhang, Qianqian Shi, Luonan Chen","doi":"10.1093/bib/bbae257","DOIUrl":"10.1093/bib/bbae257","url":null,"abstract":"<p><p>Spatially resolved transcriptomics (SRT) has emerged as a powerful tool for investigating gene expression in spatial contexts, providing insights into the molecular mechanisms underlying organ development and disease pathology. However, the expression sparsity poses a computational challenge to integrate other modalities (e.g. histological images and spatial locations) that are simultaneously captured in SRT datasets for spatial clustering and variation analyses. In this study, to meet such a challenge, we propose multi-modal domain adaption for spatial transcriptomics (stMDA), a novel multi-modal unsupervised domain adaptation method, which integrates gene expression and other modalities to reveal the spatial functional landscape. Specifically, stMDA first learns the modality-specific representations from spatial multi-modal data using multiple neural network architectures and then aligns the spatial distributions across modal representations to integrate these multi-modal representations, thus facilitating the integration of global and spatially local information and improving the consistency of clustering assignments. Our results demonstrate that stMDA outperforms existing methods in identifying spatial domains across diverse platforms and species. Furthermore, stMDA excels in identifying spatially variable genes with high prognostic potential in cancer tissues. In conclusion, stMDA as a new tool of multi-modal data integration provides a powerful and flexible framework for analyzing SRT datasets, thereby advancing our understanding of intricate biological systems.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":9.5,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11141295/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141179013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
dCCA: detecting differential covariation patterns between two types of high-throughput omics data. dCCA:检测两类高通量 omics 数据之间的差异协变模式。
IF 6.8 2区 生物学 Q1 Computer Science Pub Date : 2024-05-23 DOI: 10.1093/bib/bbae288
Hwiyoung Lee, Tianzhou Ma, Hongjie Ke, Zhenyao Ye, Shuo Chen

Motivation: The advent of multimodal omics data has provided an unprecedented opportunity to systematically investigate underlying biological mechanisms from distinct yet complementary angles. However, the joint analysis of multi-omics data remains challenging because it requires modeling interactions between multiple sets of high-throughput variables. Furthermore, these interaction patterns may vary across different clinical groups, reflecting disease-related biological processes.

Results: We propose a novel approach called Differential Canonical Correlation Analysis (dCCA) to capture differential covariation patterns between two multivariate vectors across clinical groups. Unlike classical Canonical Correlation Analysis, which maximizes the correlation between two multivariate vectors, dCCA aims to maximally recover differentially expressed multivariate-to-multivariate covariation patterns between groups. We have developed computational algorithms and a toolkit to sparsely select paired subsets of variables from two sets of multivariate variables while maximizing the differential covariation. Extensive simulation analyses demonstrate the superior performance of dCCA in selecting variables of interest and recovering differential correlations. We applied dCCA to the Pan-Kidney cohort from the Cancer Genome Atlas Program database and identified differentially expressed covariations between noncoding RNAs and gene expressions.

Availability and implementation: The R package that implements dCCA is available at https://github.com/hwiyoungstat/dCCA.

动机:多模态组学数据的出现为从不同但互补的角度系统研究潜在的生物机制提供了前所未有的机会。然而,多组学数据的联合分析仍然具有挑战性,因为它需要对多组高通量变量之间的相互作用进行建模。此外,这些相互作用模式在不同的临床群体中可能会有所不同,从而反映出与疾病相关的生物学过程:我们提出了一种名为 "差异典型相关分析"(differential Canonical Correlation Analysis,dCCA)的新方法,用于捕捉不同临床组别中两个多变量向量之间的差异协变模式。经典的卡农相关分析最大限度地提高了两个多变量向量之间的相关性,与之不同的是,dCCA旨在最大限度地恢复组间多变量对多变量的差异表达协变模式。我们开发了计算算法和工具包,从两组多元变量中稀疏地选择成对的变量子集,同时最大化差异协方差。广泛的模拟分析表明,dCCA 在选择感兴趣的变量和恢复差异相关性方面表现出色。我们将 dCCA 应用于癌症基因组图谱计划数据库中的泛肾队列,并确定了非编码 RNA 与基因表达之间的差异表达协方差:实现 dCCA 的 R 软件包可从 https://github.com/hwiyoungstat/dCCA 获取。
{"title":"dCCA: detecting differential covariation patterns between two types of high-throughput omics data.","authors":"Hwiyoung Lee, Tianzhou Ma, Hongjie Ke, Zhenyao Ye, Shuo Chen","doi":"10.1093/bib/bbae288","DOIUrl":"10.1093/bib/bbae288","url":null,"abstract":"<p><strong>Motivation: </strong>The advent of multimodal omics data has provided an unprecedented opportunity to systematically investigate underlying biological mechanisms from distinct yet complementary angles. However, the joint analysis of multi-omics data remains challenging because it requires modeling interactions between multiple sets of high-throughput variables. Furthermore, these interaction patterns may vary across different clinical groups, reflecting disease-related biological processes.</p><p><strong>Results: </strong>We propose a novel approach called Differential Canonical Correlation Analysis (dCCA) to capture differential covariation patterns between two multivariate vectors across clinical groups. Unlike classical Canonical Correlation Analysis, which maximizes the correlation between two multivariate vectors, dCCA aims to maximally recover differentially expressed multivariate-to-multivariate covariation patterns between groups. We have developed computational algorithms and a toolkit to sparsely select paired subsets of variables from two sets of multivariate variables while maximizing the differential covariation. Extensive simulation analyses demonstrate the superior performance of dCCA in selecting variables of interest and recovering differential correlations. We applied dCCA to the Pan-Kidney cohort from the Cancer Genome Atlas Program database and identified differentially expressed covariations between noncoding RNAs and gene expressions.</p><p><strong>Availability and implementation: </strong>The R package that implements dCCA is available at https://github.com/hwiyoungstat/dCCA.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11184902/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141417823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancing drug-response prediction using multi-modal and -omics machine learning integration (MOMLIN): a case study on breast cancer clinical data. 利用多模态和组学机器学习集成(MOMLIN)推进药物反应预测:乳腺癌临床数据案例研究。
IF 6.8 2区 生物学 Q1 Computer Science Pub Date : 2024-05-23 DOI: 10.1093/bib/bbae300
Md Mamunur Rashid, Kumar Selvarajoo

The inherent heterogeneity of cancer contributes to highly variable responses to any anticancer treatments. This underscores the need to first identify precise biomarkers through complex multi-omics datasets that are now available. Although much research has focused on this aspect, identifying biomarkers associated with distinct drug responders still remains a major challenge. Here, we develop MOMLIN, a multi-modal and -omics machine learning integration framework, to enhance drug-response prediction. MOMLIN jointly utilizes sparse correlation algorithms and class-specific feature selection algorithms, which identifies multi-modal and -omics-associated interpretable components. MOMLIN was applied to 147 patients' breast cancer datasets (clinical, mutation, gene expression, tumor microenvironment cells and molecular pathways) to analyze drug-response class predictions for non-responders and variable responders. Notably, MOMLIN achieves an average AUC of 0.989, which is at least 10% greater when compared with current state-of-the-art (data integration analysis for biomarker discovery using latent components, multi-omics factor analysis, sparse canonical correlation analysis). Moreover, MOMLIN not only detects known individual biomarkers such as genes at mutation/expression level, most importantly, it correlates multi-modal and -omics network biomarkers for each response class. For example, an interaction between ER-negative-HMCN1-COL5A1 mutations-FBXO2-CSF3R expression-CD8 emerge as a multimodal biomarker for responders, potentially affecting antimicrobial peptides and FLT3 signaling pathways. In contrast, for resistance cases, a distinct combination of lymph node-TP53 mutation-PON3-ENSG00000261116 lncRNA expression-HLA-E-T-cell exclusions emerged as multimodal biomarkers, possibly impacting neurotransmitter release cycle pathway. MOMLIN, therefore, is expected advance precision medicine, such as to detect context-specific multi-omics network biomarkers and better predict drug-response classifications.

癌症固有的异质性导致对任何抗癌治疗的反应都千差万别。这凸显了首先通过复杂的多组学数据集确定精确生物标志物的必要性,而现在已经有了多组学数据集。尽管很多研究都集中在这方面,但识别与不同药物反应者相关的生物标志物仍是一大挑战。在此,我们开发了多模态和组学机器学习集成框架 MOMLIN,以加强药物反应预测。MOMLIN 联合使用稀疏相关算法和特定类别特征选择算法,从而识别出多模态和组学相关的可解释成分。将 MOMLIN 应用于 147 个乳腺癌患者数据集(临床、突变、基因表达、肿瘤微环境细胞和分子通路),分析了无应答者和可变响应者的药物响应类别预测。值得注意的是,MOMLIN 的平均 AUC 为 0.989,与目前最先进的方法(使用潜在成分、多组学因子分析、稀疏卡农相关分析进行生物标记物发现的数据整合分析)相比,至少提高了 10%。此外,MOMLIN 不仅能检测已知的单个生物标志物,如基因突变/表达水平,最重要的是,它还能为每个反应类别关联多模态和组学网络生物标志物。例如,ER阴性-HMCN1-COL5A1突变-FBXO2-CSF3R表达-CD8之间的相互作用成为应答者的多模式生物标志物,可能影响抗菌肽和FLT3信号通路。相反,在耐药病例中,淋巴结-TP53突变-PON3-ESG00000261116 lncRNA表达-HLA-E-T细胞排除的独特组合成为多模式生物标志物,可能影响神经递质释放周期途径。因此,MOMLIN有望推动精准医疗的发展,如检测特定情境的多组学网络生物标志物,更好地预测药物反应分类。
{"title":"Advancing drug-response prediction using multi-modal and -omics machine learning integration (MOMLIN): a case study on breast cancer clinical data.","authors":"Md Mamunur Rashid, Kumar Selvarajoo","doi":"10.1093/bib/bbae300","DOIUrl":"10.1093/bib/bbae300","url":null,"abstract":"<p><p>The inherent heterogeneity of cancer contributes to highly variable responses to any anticancer treatments. This underscores the need to first identify precise biomarkers through complex multi-omics datasets that are now available. Although much research has focused on this aspect, identifying biomarkers associated with distinct drug responders still remains a major challenge. Here, we develop MOMLIN, a multi-modal and -omics machine learning integration framework, to enhance drug-response prediction. MOMLIN jointly utilizes sparse correlation algorithms and class-specific feature selection algorithms, which identifies multi-modal and -omics-associated interpretable components. MOMLIN was applied to 147 patients' breast cancer datasets (clinical, mutation, gene expression, tumor microenvironment cells and molecular pathways) to analyze drug-response class predictions for non-responders and variable responders. Notably, MOMLIN achieves an average AUC of 0.989, which is at least 10% greater when compared with current state-of-the-art (data integration analysis for biomarker discovery using latent components, multi-omics factor analysis, sparse canonical correlation analysis). Moreover, MOMLIN not only detects known individual biomarkers such as genes at mutation/expression level, most importantly, it correlates multi-modal and -omics network biomarkers for each response class. For example, an interaction between ER-negative-HMCN1-COL5A1 mutations-FBXO2-CSF3R expression-CD8 emerge as a multimodal biomarker for responders, potentially affecting antimicrobial peptides and FLT3 signaling pathways. In contrast, for resistance cases, a distinct combination of lymph node-TP53 mutation-PON3-ENSG00000261116 lncRNA expression-HLA-E-T-cell exclusions emerged as multimodal biomarkers, possibly impacting neurotransmitter release cycle pathway. MOMLIN, therefore, is expected advance precision medicine, such as to detect context-specific multi-omics network biomarkers and better predict drug-response classifications.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11190965/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141431373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Balancing Clinical Applicability and Scientific Depth in ML Models for MDA5-DM Prognosis. 平衡 MDA5-DM 预后 ML 模型的临床适用性和科学深度。
IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-05-23 DOI: 10.1093/bib/bbae295
Emily McLeish, Nataliya Slater, Frank L Mastaglia, Merrilee Needham, Jerome D Coudert
{"title":"Balancing Clinical Applicability and Scientific Depth in ML Models for MDA5-DM Prognosis.","authors":"Emily McLeish, Nataliya Slater, Frank L Mastaglia, Merrilee Needham, Jerome D Coudert","doi":"10.1093/bib/bbae295","DOIUrl":"10.1093/bib/bbae295","url":null,"abstract":"","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11215546/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141466232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Briefings in bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1