agan - pathifier:一种针对不平衡基因表达数据改进癌症分类的数据增强方法。

IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS BMC Bioinformatics Pub Date : 2024-12-27 DOI:10.1186/s12859-024-06013-z
Qiaosheng Zhang, Yalong Wei, Jie Hou, Hongpeng Li, Zhaoman Zhong
{"title":"agan - pathifier:一种针对不平衡基因表达数据改进癌症分类的数据增强方法。","authors":"Qiaosheng Zhang, Yalong Wei, Jie Hou, Hongpeng Li, Zhaoman Zhong","doi":"10.1186/s12859-024-06013-z","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Cancer classification has consistently been a challenging problem, with the main difficulties being high-dimensional data and the collection of patient samples. Concretely, obtaining patient samples is a costly and resource-intensive process, and imbalances often exist between samples. Moreover, expression data is characterized by high dimensionality, small samples and high noise, which could easily lead to struggles such as dimensionality catastrophe and overfitting. Thus, we incorporate prior knowledge from the pathway and combine AutoEncoder and Generative Adversarial Network (GAN) to solve these difficulties.</p><p><strong>Results: </strong>In this study, we propose an effective and efficient deep learning method, named AEGAN, which combines the capabilities of AutoEncoder and GAN to generate synthetic samples of the minority class in imbalanced gene expression data. The proposed data balancing technique has been demonstrated to be useful for cancer classification and improving the performance of classifier models. Additionally, we integrate prior knowledge from the pathway and employ the pathifier algorithm to calculate pathway scores for each sample. This data augmentation approach, referred to as AEGAN-Pathifier, not only preserves the biological functionality of the data but also possesses dimensional reduction capabilities. Through validation with various classifiers, the experimental results show an improvement in classifier performance.</p><p><strong>Conclusion: </strong>AEGAN-Pathifier shows improved performance on the imbalanced datasets GSE25066, GSE20194, BRCA and Liver24. Results from various classifiers indicate that AEGAN-Pathifier has good generalization capability.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"392"},"PeriodicalIF":2.9000,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11673641/pdf/","citationCount":"0","resultStr":"{\"title\":\"AEGAN-Pathifier: a data augmentation method to improve cancer classification for imbalanced gene expression data.\",\"authors\":\"Qiaosheng Zhang, Yalong Wei, Jie Hou, Hongpeng Li, Zhaoman Zhong\",\"doi\":\"10.1186/s12859-024-06013-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Cancer classification has consistently been a challenging problem, with the main difficulties being high-dimensional data and the collection of patient samples. Concretely, obtaining patient samples is a costly and resource-intensive process, and imbalances often exist between samples. Moreover, expression data is characterized by high dimensionality, small samples and high noise, which could easily lead to struggles such as dimensionality catastrophe and overfitting. Thus, we incorporate prior knowledge from the pathway and combine AutoEncoder and Generative Adversarial Network (GAN) to solve these difficulties.</p><p><strong>Results: </strong>In this study, we propose an effective and efficient deep learning method, named AEGAN, which combines the capabilities of AutoEncoder and GAN to generate synthetic samples of the minority class in imbalanced gene expression data. The proposed data balancing technique has been demonstrated to be useful for cancer classification and improving the performance of classifier models. Additionally, we integrate prior knowledge from the pathway and employ the pathifier algorithm to calculate pathway scores for each sample. This data augmentation approach, referred to as AEGAN-Pathifier, not only preserves the biological functionality of the data but also possesses dimensional reduction capabilities. Through validation with various classifiers, the experimental results show an improvement in classifier performance.</p><p><strong>Conclusion: </strong>AEGAN-Pathifier shows improved performance on the imbalanced datasets GSE25066, GSE20194, BRCA and Liver24. Results from various classifiers indicate that AEGAN-Pathifier has good generalization capability.</p>\",\"PeriodicalId\":8958,\"journal\":{\"name\":\"BMC Bioinformatics\",\"volume\":\"25 1\",\"pages\":\"392\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-12-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11673641/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12859-024-06013-z\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-024-06013-z","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

背景:癌症分类一直是一个具有挑战性的问题,主要困难在于高维数据和患者样本的收集。具体而言,获取患者样本是一个昂贵且资源密集的过程,并且样本之间经常存在不平衡。此外,表达数据具有高维数、小样本和高噪声的特点,容易导致维数突变和过拟合等问题。因此,我们从路径中吸收先验知识,并结合自动编码器和生成对抗网络(GAN)来解决这些困难。结果:在本研究中,我们提出了一种有效且高效的深度学习方法,称为AEGAN,该方法结合了AutoEncoder和GAN的能力来生成不平衡基因表达数据中少数类的合成样本。所提出的数据平衡技术已被证明对癌症分类和提高分类器模型的性能是有用的。此外,我们整合了路径的先验知识,并使用pathifier算法计算每个样本的路径得分。这种数据增强方法,被称为agan - pathifier,不仅保留了数据的生物功能,而且具有降维能力。通过各种分类器的验证,实验结果表明分类器性能有所提高。结论:AEGAN-Pathifier在不平衡数据集GSE25066、GSE20194、BRCA和Liver24上表现出更好的性能。各种分类器的结果表明,AEGAN-Pathifier具有良好的泛化能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
AEGAN-Pathifier: a data augmentation method to improve cancer classification for imbalanced gene expression data.

Background: Cancer classification has consistently been a challenging problem, with the main difficulties being high-dimensional data and the collection of patient samples. Concretely, obtaining patient samples is a costly and resource-intensive process, and imbalances often exist between samples. Moreover, expression data is characterized by high dimensionality, small samples and high noise, which could easily lead to struggles such as dimensionality catastrophe and overfitting. Thus, we incorporate prior knowledge from the pathway and combine AutoEncoder and Generative Adversarial Network (GAN) to solve these difficulties.

Results: In this study, we propose an effective and efficient deep learning method, named AEGAN, which combines the capabilities of AutoEncoder and GAN to generate synthetic samples of the minority class in imbalanced gene expression data. The proposed data balancing technique has been demonstrated to be useful for cancer classification and improving the performance of classifier models. Additionally, we integrate prior knowledge from the pathway and employ the pathifier algorithm to calculate pathway scores for each sample. This data augmentation approach, referred to as AEGAN-Pathifier, not only preserves the biological functionality of the data but also possesses dimensional reduction capabilities. Through validation with various classifiers, the experimental results show an improvement in classifier performance.

Conclusion: AEGAN-Pathifier shows improved performance on the imbalanced datasets GSE25066, GSE20194, BRCA and Liver24. Results from various classifiers indicate that AEGAN-Pathifier has good generalization capability.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
BMC Bioinformatics
BMC Bioinformatics 生物-生化研究方法
CiteScore
5.70
自引率
3.30%
发文量
506
审稿时长
4.3 months
期刊介绍: BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
期刊最新文献
BioLake: an RNA expression analysis framework for prostate cancer biomarker powered by data lakehouse. CellMAP: an open-source software tool to batch-process cell topography and stiffness maps collected with an atomic force microscope. Accurate assembly of full-length consensus for viral quasispecies. Flexible analysis of spatial transcriptomics data (FAST): a deconvolution approach. Biomedical named entity recognition using improved green anaconda-assisted Bi-GRU-based hierarchical ResNet model.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1