Arya Hadizadeh Moghaddam, Mohsen Nayebi Kerdabadi, Cuncong Zhong, Zijun Yao
{"title":"Meta-Learning on Augmented Gene Expression Profiles for Enhanced Lung Cancer Detection","authors":"Arya Hadizadeh Moghaddam, Mohsen Nayebi Kerdabadi, Cuncong Zhong, Zijun Yao","doi":"arxiv-2408.09635","DOIUrl":null,"url":null,"abstract":"Gene expression profiles obtained through DNA microarray have proven\nsuccessful in providing critical information for cancer detection classifiers.\nHowever, the limited number of samples in these datasets poses a challenge to\nemploy complex methodologies such as deep neural networks for sophisticated\nanalysis. To address this \"small data\" dilemma, Meta-Learning has been\nintroduced as a solution to enhance the optimization of machine learning models\nby utilizing similar datasets, thereby facilitating a quicker adaptation to\ntarget datasets without the requirement of sufficient samples. In this study,\nwe present a meta-learning-based approach for predicting lung cancer from gene\nexpression profiles. We apply this framework to well-established deep learning\nmethodologies and employ four distinct datasets for the meta-learning tasks,\nwhere one as the target dataset and the rest as source datasets. Our approach\nis evaluated against both traditional and deep learning methodologies, and the\nresults show the superior performance of meta-learning on augmented source data\ncompared to the baselines trained on single datasets. Moreover, we conduct the\ncomparative analysis between meta-learning and transfer learning methodologies\nto highlight the efficiency of the proposed approach in addressing the\nchallenges associated with limited sample sizes. Finally, we incorporate the\nexplainability study to illustrate the distinctiveness of decisions made by\nmeta-learning.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.09635","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Gene expression profiles obtained through DNA microarray have proven
successful in providing critical information for cancer detection classifiers.
However, the limited number of samples in these datasets poses a challenge to
employ complex methodologies such as deep neural networks for sophisticated
analysis. To address this "small data" dilemma, Meta-Learning has been
introduced as a solution to enhance the optimization of machine learning models
by utilizing similar datasets, thereby facilitating a quicker adaptation to
target datasets without the requirement of sufficient samples. In this study,
we present a meta-learning-based approach for predicting lung cancer from gene
expression profiles. We apply this framework to well-established deep learning
methodologies and employ four distinct datasets for the meta-learning tasks,
where one as the target dataset and the rest as source datasets. Our approach
is evaluated against both traditional and deep learning methodologies, and the
results show the superior performance of meta-learning on augmented source data
compared to the baselines trained on single datasets. Moreover, we conduct the
comparative analysis between meta-learning and transfer learning methodologies
to highlight the efficiency of the proposed approach in addressing the
challenges associated with limited sample sizes. Finally, we incorporate the
explainability study to illustrate the distinctiveness of decisions made by
meta-learning.
通过 DNA 微阵列获得的基因表达谱已被证明能成功地为癌症检测分类器提供关键信息。然而,这些数据集中的样本数量有限,这对采用深度神经网络等复杂方法进行精密分析构成了挑战。为了解决这种 "小数据 "困境,元学习被引入作为一种解决方案,通过利用相似数据集来加强机器学习模型的优化,从而在不需要足够样本的情况下更快地适应目标数据集。在本研究中,我们提出了一种基于元学习的方法,用于从基因表达谱预测肺癌。我们将这一框架应用于成熟的深度学习方法,并采用四个不同的数据集来完成元学习任务,其中一个作为目标数据集,其余的作为源数据集。我们的方法与传统方法和深度学习方法进行了对比评估,结果表明元学习在增强源数据上的性能优于在单一数据集上训练的基线。此外,我们还对元学习和迁移学习方法进行了比较分析,以突出所提方法在解决有限样本量相关挑战方面的效率。最后,我们纳入了可解释性研究,以说明元学习所做决策的独特性。