Classification-based pathway analysis using GPNet with novel P-value computation.

IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Briefings in bioinformatics Pub Date : 2024-11-22 DOI:10.1093/bib/bbaf039
Hao Lu, Mostafa Rezapour, Haseebullah Baha, Muhammad Khalid Khan Niazi, Aarthi Narayanan, Metin Nafi Gurcan
{"title":"Classification-based pathway analysis using GPNet with novel P-value computation.","authors":"Hao Lu, Mostafa Rezapour, Haseebullah Baha, Muhammad Khalid Khan Niazi, Aarthi Narayanan, Metin Nafi Gurcan","doi":"10.1093/bib/bbaf039","DOIUrl":null,"url":null,"abstract":"<p><p>Pathway analysis plays a critical role in bioinformatics, enabling researchers to identify biological pathways associated with various conditions by analyzing gene expression data. However, the rise of large, multi-center datasets has highlighted limitations in traditional methods like Over-Representation Analysis (ORA) and Functional Class Scoring (FCS), which struggle with low signal-to-noise ratios (SNR) and large sample sizes. To tackle these challenges, we use a deep learning-based classification method, Gene PointNet, and a novel $P$-value computation approach leveraging the confusion matrix to address pathway analysis tasks. We validated our method effectiveness through a comparative study using a simulated dataset and RNA-Seq data from The Cancer Genome Atlas breast cancer dataset. Our method was benchmarked against traditional techniques (ORA, FCS), shallow machine learning models (logistic regression, support vector machine), and deep learning approaches (DeepHisCom, PASNet). The results demonstrate that GPNet outperforms these methods in low-SNR, large-sample datasets, where it remains robust and reliable, significantly reducing both Type I error and improving power. This makes our method well suited for pathway analysis in large, multi-center studies. The code can be found at https://github.com/haolu123/GPNet_pathway\">https://github.com/haolu123/GPNet_pathway.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11775473/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf039","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Pathway analysis plays a critical role in bioinformatics, enabling researchers to identify biological pathways associated with various conditions by analyzing gene expression data. However, the rise of large, multi-center datasets has highlighted limitations in traditional methods like Over-Representation Analysis (ORA) and Functional Class Scoring (FCS), which struggle with low signal-to-noise ratios (SNR) and large sample sizes. To tackle these challenges, we use a deep learning-based classification method, Gene PointNet, and a novel $P$-value computation approach leveraging the confusion matrix to address pathway analysis tasks. We validated our method effectiveness through a comparative study using a simulated dataset and RNA-Seq data from The Cancer Genome Atlas breast cancer dataset. Our method was benchmarked against traditional techniques (ORA, FCS), shallow machine learning models (logistic regression, support vector machine), and deep learning approaches (DeepHisCom, PASNet). The results demonstrate that GPNet outperforms these methods in low-SNR, large-sample datasets, where it remains robust and reliable, significantly reducing both Type I error and improving power. This makes our method well suited for pathway analysis in large, multi-center studies. The code can be found at https://github.com/haolu123/GPNet_pathway">https://github.com/haolu123/GPNet_pathway.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于分类的GPNet路径分析与新颖的p值计算。
途径分析在生物信息学中起着至关重要的作用,使研究人员能够通过分析基因表达数据来识别与各种疾病相关的生物途径。然而,大型多中心数据集的兴起凸显了传统方法的局限性,如过度代表性分析(ORA)和功能类评分(FCS),这些方法与低信噪比(SNR)和大样本量作斗争。为了应对这些挑战,我们使用了一种基于深度学习的分类方法Gene PointNet,以及一种利用混淆矩阵来解决路径分析任务的新型P值计算方法。我们通过使用模拟数据集和来自癌症基因组图谱乳腺癌数据集的RNA-Seq数据进行比较研究,验证了我们方法的有效性。我们的方法与传统技术(ORA, FCS),浅层机器学习模型(逻辑回归,支持向量机)和深度学习方法(DeepHisCom, PASNet)进行了基准测试。结果表明,GPNet在低信噪比、大样本数据集上的性能优于这些方法,在这些数据集上,GPNet保持了鲁棒性和可靠性,显著降低了I型误差并提高了功率。这使得我们的方法非常适合于大型、多中心研究中的通路分析。代码可以在https://github.com/haolu123/GPNet_pathway“>https://github.com/haolu123/GPNet_pathway”找到。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Briefings in bioinformatics
Briefings in bioinformatics 生物-生化研究方法
CiteScore
13.20
自引率
13.70%
发文量
549
审稿时长
6 months
期刊介绍: Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.
期刊最新文献
EpGAT: integrating epigenetics and 3D genome structure to predict alternative splicing and polyadenylation. Could statistical potential models achieve comparable or better performance than deep learning models? Integrating feature selection with unsupervised deep embedding for clustering single-cell RNA-seq data. Master of Metals2: a graph neural network based architecture for the prediction of zinc binding sites in protein structures. ORANGE: a machine learning approach for modeling tissue-specific aging from transcriptomic data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1