Stacking Ensemble Method for Early and Advanced Stage Lung Adenocarcinoma Classification Based on miRNA Expression

Adeel Khan, N. He, Irfan Tariq, Zhiyang Li
{"title":"Stacking Ensemble Method for Early and Advanced Stage Lung Adenocarcinoma Classification Based on miRNA Expression","authors":"Adeel Khan, N. He, Irfan Tariq, Zhiyang Li","doi":"10.1145/3498731.3498742","DOIUrl":null,"url":null,"abstract":"Lung cancer and its various types are a leading cause of death across the globe. Many studies have pointed out that microRNAs (miRNAs) dysregulation can be a useful marker for variety of cancers, including lung cancer. Successful treatment of all cancers depends on clinical expertise, treatment resources, and the stage at the time of diagnosis. Therefore, we made an effort to find a novel miRNA expression marker to determine the stage of lung adenocarcinoma (LUAD). In this manuscript, we proposed a stack ensemble method for classifying early and advanced stage LUAD using miRNA expression data. In our benchmark dataset, 445 were early-stage, and 114 were advanced-stage LUAD patients. The benchmark dataset was imbalanced, so to balance our dataset, we used Synthetic Minority Over Sampling Technique (SMOTE). We then divided the balanced LUAD patient’s dataset into training dataset (80%) and testing dataset (20%). Random Forest (RF) technique was implemented for the selection of best optimal features (miRNA sequence expression) out of 1880 miRNAs, followed by machine learning (ML) Stack ensemble method to classify the early and advanced stage LUAD. Compared to the traditional ML classifier used as a baseline, the stack ensemble method classified the early and advanced stage LUAD more efficiently with 99% accuracy. The proposed method’s precision for early-stage LUAD was 92% and for advance stage LUAD 84%. Similarly, the recall of the proposed method for early and advanced stage LUAD was 82% and 93%, respectively. The F1-Score of the proposed method for early and advanced stage LUAD was 87% and 88%, respectively. To conclude, the results obtained clearly showed the effectiveness of ensemble method for the classification of early and advanced stage LUAD using miRNA expression data. The top 10 miRNAs sequences identified by the model can help make the best treatment decisions for early and advanced stage LUAD to increase the chances of survival.","PeriodicalId":166893,"journal":{"name":"Proceedings of the 2021 10th International Conference on Bioinformatics and Biomedical Science","volume":"356 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 10th International Conference on Bioinformatics and Biomedical Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3498731.3498742","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Lung cancer and its various types are a leading cause of death across the globe. Many studies have pointed out that microRNAs (miRNAs) dysregulation can be a useful marker for variety of cancers, including lung cancer. Successful treatment of all cancers depends on clinical expertise, treatment resources, and the stage at the time of diagnosis. Therefore, we made an effort to find a novel miRNA expression marker to determine the stage of lung adenocarcinoma (LUAD). In this manuscript, we proposed a stack ensemble method for classifying early and advanced stage LUAD using miRNA expression data. In our benchmark dataset, 445 were early-stage, and 114 were advanced-stage LUAD patients. The benchmark dataset was imbalanced, so to balance our dataset, we used Synthetic Minority Over Sampling Technique (SMOTE). We then divided the balanced LUAD patient’s dataset into training dataset (80%) and testing dataset (20%). Random Forest (RF) technique was implemented for the selection of best optimal features (miRNA sequence expression) out of 1880 miRNAs, followed by machine learning (ML) Stack ensemble method to classify the early and advanced stage LUAD. Compared to the traditional ML classifier used as a baseline, the stack ensemble method classified the early and advanced stage LUAD more efficiently with 99% accuracy. The proposed method’s precision for early-stage LUAD was 92% and for advance stage LUAD 84%. Similarly, the recall of the proposed method for early and advanced stage LUAD was 82% and 93%, respectively. The F1-Score of the proposed method for early and advanced stage LUAD was 87% and 88%, respectively. To conclude, the results obtained clearly showed the effectiveness of ensemble method for the classification of early and advanced stage LUAD using miRNA expression data. The top 10 miRNAs sequences identified by the model can help make the best treatment decisions for early and advanced stage LUAD to increase the chances of survival.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于miRNA表达的早期和晚期肺腺癌分级的堆叠集成方法
肺癌及其各种类型是全球死亡的主要原因。许多研究指出,microRNAs (miRNAs)失调可以作为包括肺癌在内的多种癌症的有用标志物。所有癌症的成功治疗取决于临床专业知识、治疗资源和诊断时的阶段。因此,我们努力寻找一种新的miRNA表达标记物来判断肺腺癌(LUAD)的分期。在这篇文章中,我们提出了一种利用miRNA表达数据对早期和晚期LUAD进行分类的堆栈集成方法。在我们的基准数据集中,445例为早期LUAD患者,114例为晚期LUAD患者。基准数据集是不平衡的,因此为了平衡我们的数据集,我们使用了合成少数派过采样技术(SMOTE)。然后我们将平衡的LUAD患者数据集分为训练数据集(80%)和测试数据集(20%)。采用随机森林(RF)技术从1880个miRNA中选择最优特征(miRNA序列表达),然后采用机器学习(ML)堆栈集成方法对早期和晚期LUAD进行分类。与作为基线的传统ML分类器相比,堆栈集成方法对早期和晚期LUAD的分类效率更高,准确率达到99%。该方法对早期LUAD的检测精度为92%,对晚期LUAD的检测精度为84%。同样,该方法对早期和晚期LUAD的召回率分别为82%和93%。该方法对早期和晚期LUAD的f1评分分别为87%和88%。综上所述,所获得的结果清楚地表明了使用miRNA表达数据对早期和晚期LUAD进行分类的集成方法的有效性。该模型鉴定出的前10个miRNAs序列有助于为早期和晚期LUAD制定最佳治疗决策,以增加生存机会。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Meshfree Method for Deformation Field Reconstruction of Soft Tissue in Needle Insertion A Systematic Review of National Drug Negotiations Use of machine learning to predict abandonment rates in an emergency department A study of healthcare associated infections in the Intensive Care Unit of “Federico II” University Hospital through Logistic Regression The Role of Circulating Tumor Cells in Diagnosis of Cancer: Cancer and Circulating Tumor Cells
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1