Screening of Serum miRNAs as Diagnostic Biomarkers for Lung Cancer Using the Minimal-Redundancy-Maximal-Relevance Algorithm and Random Forest Classifier Based on a Public Database.

IF 1.3 4区 医学 Q4 GENETICS & HEREDITY Public Health Genomics Pub Date : 2022-08-02 DOI:10.1159/000525316
Xiaoyan Huang, Xiong Chen, Xi Chen, Wenling Wang
{"title":"Screening of Serum miRNAs as Diagnostic Biomarkers for Lung Cancer Using the Minimal-Redundancy-Maximal-Relevance Algorithm and Random Forest Classifier Based on a Public Database.","authors":"Xiaoyan Huang, Xiong Chen, Xi Chen, Wenling Wang","doi":"10.1159/000525316","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Lung cancer is one of the deadliest cancers, early diagnosis of which can efficiently enhance patient's survival. We aimed to screening out the serum miRNAs as diagnostic biomarkers for patients with lung cancer.</p><p><strong>Methods: </strong>A total of 416 remarkably differentially expressed miRNAs were acquired using the limma package, and next feature ranking was derived by the minimal-redundancy-maximal-relevance method. An incremental feature selection algorithm of a random forest (RF) classifier was utilized to choose the top 5 miRNA combination with the optimum predictive performance. The performance of the RF classifier of top 5 miRNAs was analyzed using the receiver operator characteristic (ROC) curve. Afterward, the classification effect of the 5-miRNA combination was validated through principal component analysis and hierarchical clustering analysis. Analysis of top 5 miRNA expressions between lung cancer patients and normal people was performed based on GSE137140 dataset, and their expression was validated by qPCR. The hierarchical clustering analysis was used to analyze the similarity of 5 miRNAs expression profiles. ROC analysis was undertaken on each miRNA.</p><p><strong>Results: </strong>We acquired top 5 miRNAs finally, with the Matthews correlation coefficient value as 0.988 and the area under the curve (AUC) value as 0.996. The 5 feature miRNAs were capable of distinguishing most cancer patients and normal people. Furthermore, except for the lowly expressed miR-6875-5p in lung cancer tissue, the other 4 miRNAs all expressed highly in cancer patients. Performance analysis revealed that their AUC values were 0.92, 0.96, 0.94, 0.95, and 0.93, respectively.</p><p><strong>Conclusion: </strong>By and large, the 5 feature miRNAs screened here were anticipated to be effective biomarkers for lung cancer.</p>","PeriodicalId":49650,"journal":{"name":"Public Health Genomics","volume":null,"pages":null},"PeriodicalIF":1.3000,"publicationDate":"2022-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Public Health Genomics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1159/000525316","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Lung cancer is one of the deadliest cancers, early diagnosis of which can efficiently enhance patient's survival. We aimed to screening out the serum miRNAs as diagnostic biomarkers for patients with lung cancer.

Methods: A total of 416 remarkably differentially expressed miRNAs were acquired using the limma package, and next feature ranking was derived by the minimal-redundancy-maximal-relevance method. An incremental feature selection algorithm of a random forest (RF) classifier was utilized to choose the top 5 miRNA combination with the optimum predictive performance. The performance of the RF classifier of top 5 miRNAs was analyzed using the receiver operator characteristic (ROC) curve. Afterward, the classification effect of the 5-miRNA combination was validated through principal component analysis and hierarchical clustering analysis. Analysis of top 5 miRNA expressions between lung cancer patients and normal people was performed based on GSE137140 dataset, and their expression was validated by qPCR. The hierarchical clustering analysis was used to analyze the similarity of 5 miRNAs expression profiles. ROC analysis was undertaken on each miRNA.

Results: We acquired top 5 miRNAs finally, with the Matthews correlation coefficient value as 0.988 and the area under the curve (AUC) value as 0.996. The 5 feature miRNAs were capable of distinguishing most cancer patients and normal people. Furthermore, except for the lowly expressed miR-6875-5p in lung cancer tissue, the other 4 miRNAs all expressed highly in cancer patients. Performance analysis revealed that their AUC values were 0.92, 0.96, 0.94, 0.95, and 0.93, respectively.

Conclusion: By and large, the 5 feature miRNAs screened here were anticipated to be effective biomarkers for lung cancer.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用基于公共数据库的最小冗余度-最大相关性算法和随机森林分类器筛选作为肺癌诊断生物标记物的血清 miRNA。
背景:肺癌是最致命的癌症之一:肺癌是最致命的癌症之一,早期诊断可有效提高患者的生存率。我们的目的是筛选出可作为肺癌患者诊断生物标志物的血清 miRNAs:方法:利用limma软件包获取了416个显著差异表达的miRNA,并通过最小冗余-最大相关性方法得出了下一个特征排序。利用随机森林(RF)分类器的增量特征选择算法,选出预测性能最佳的前 5 个 miRNA 组合。利用接收器运算特征曲线(ROC)分析了前 5 个 miRNA 的 RF 分类器的性能。随后,通过主成分分析和层次聚类分析验证了 5 个 miRNA 组合的分类效果。基于 GSE137140 数据集分析了肺癌患者与正常人之间前 5 种 miRNA 的表达,并通过 qPCR 验证了它们的表达。分层聚类分析用于分析 5 个 miRNA 表达谱的相似性。对每个 miRNA 进行了 ROC 分析:结果:我们最终获得了前 5 个 miRNA,马修斯相关系数为 0.988,曲线下面积(AUC)为 0.996。这 5 个特征 miRNA 能够区分大多数癌症患者和正常人。此外,除了 miR-6875-5p 在肺癌组织中低表达外,其他 4 个 miRNA 在癌症患者中均高表达。性能分析表明,它们的 AUC 值分别为 0.92、0.96、0.94、0.95 和 0.93:总的来说,本文筛选的 5 个特征 miRNA 可望成为肺癌的有效生物标记物。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Public Health Genomics
Public Health Genomics 医学-公共卫生、环境卫生与职业卫生
CiteScore
2.90
自引率
0.00%
发文量
14
审稿时长
>12 weeks
期刊介绍: ''Public Health Genomics'' is the leading international journal focusing on the timely translation of genome-based knowledge and technologies into public health, health policies, and healthcare as a whole. This peer-reviewed journal is a bimonthly forum featuring original papers, reviews, short communications, and policy statements. It is supplemented by topic-specific issues providing a comprehensive, holistic and ''all-inclusive'' picture of the chosen subject. Multidisciplinary in scope, it combines theoretical and empirical work from a range of disciplines, notably public health, molecular and medical sciences, the humanities and social sciences. In so doing, it also takes into account rapid scientific advances from fields such as systems biology, microbiomics, epigenomics or information and communication technologies as well as the hight potential of ''big data'' for public health.
期刊最新文献
"The Biggest Struggle:" Navigating Trust and Uncertainty in Genetic Variant Interpretation. "Should I let them know I have this?": Multifaceted genetic discrimination and limited awareness of legal protections amongst individuals with hereditary cancer syndromes. Who's on your genomics research team? Consumer experiences from Australia. Development and Pilot Testing of Evidence-Based Interventions to Improve Adherence after Receiving a Genetic Result. Co-creating the experience of consent for newborn genome sequencing (The Generation Study).
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1