{"title":"利用基于公共数据库的最小冗余度-最大相关性算法和随机森林分类器筛选作为肺癌诊断生物标记物的血清 miRNA。","authors":"Xiaoyan Huang, Xiong Chen, Xi Chen, Wenling Wang","doi":"10.1159/000525316","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Lung cancer is one of the deadliest cancers, early diagnosis of which can efficiently enhance patient's survival. We aimed to screening out the serum miRNAs as diagnostic biomarkers for patients with lung cancer.</p><p><strong>Methods: </strong>A total of 416 remarkably differentially expressed miRNAs were acquired using the limma package, and next feature ranking was derived by the minimal-redundancy-maximal-relevance method. An incremental feature selection algorithm of a random forest (RF) classifier was utilized to choose the top 5 miRNA combination with the optimum predictive performance. The performance of the RF classifier of top 5 miRNAs was analyzed using the receiver operator characteristic (ROC) curve. Afterward, the classification effect of the 5-miRNA combination was validated through principal component analysis and hierarchical clustering analysis. Analysis of top 5 miRNA expressions between lung cancer patients and normal people was performed based on GSE137140 dataset, and their expression was validated by qPCR. The hierarchical clustering analysis was used to analyze the similarity of 5 miRNAs expression profiles. ROC analysis was undertaken on each miRNA.</p><p><strong>Results: </strong>We acquired top 5 miRNAs finally, with the Matthews correlation coefficient value as 0.988 and the area under the curve (AUC) value as 0.996. The 5 feature miRNAs were capable of distinguishing most cancer patients and normal people. Furthermore, except for the lowly expressed miR-6875-5p in lung cancer tissue, the other 4 miRNAs all expressed highly in cancer patients. Performance analysis revealed that their AUC values were 0.92, 0.96, 0.94, 0.95, and 0.93, respectively.</p><p><strong>Conclusion: </strong>By and large, the 5 feature miRNAs screened here were anticipated to be effective biomarkers for lung cancer.</p>","PeriodicalId":49650,"journal":{"name":"Public Health Genomics","volume":null,"pages":null},"PeriodicalIF":1.3000,"publicationDate":"2022-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Screening of Serum miRNAs as Diagnostic Biomarkers for Lung Cancer Using the Minimal-Redundancy-Maximal-Relevance Algorithm and Random Forest Classifier Based on a Public Database.\",\"authors\":\"Xiaoyan Huang, Xiong Chen, Xi Chen, Wenling Wang\",\"doi\":\"10.1159/000525316\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Lung cancer is one of the deadliest cancers, early diagnosis of which can efficiently enhance patient's survival. We aimed to screening out the serum miRNAs as diagnostic biomarkers for patients with lung cancer.</p><p><strong>Methods: </strong>A total of 416 remarkably differentially expressed miRNAs were acquired using the limma package, and next feature ranking was derived by the minimal-redundancy-maximal-relevance method. An incremental feature selection algorithm of a random forest (RF) classifier was utilized to choose the top 5 miRNA combination with the optimum predictive performance. The performance of the RF classifier of top 5 miRNAs was analyzed using the receiver operator characteristic (ROC) curve. Afterward, the classification effect of the 5-miRNA combination was validated through principal component analysis and hierarchical clustering analysis. Analysis of top 5 miRNA expressions between lung cancer patients and normal people was performed based on GSE137140 dataset, and their expression was validated by qPCR. The hierarchical clustering analysis was used to analyze the similarity of 5 miRNAs expression profiles. ROC analysis was undertaken on each miRNA.</p><p><strong>Results: </strong>We acquired top 5 miRNAs finally, with the Matthews correlation coefficient value as 0.988 and the area under the curve (AUC) value as 0.996. The 5 feature miRNAs were capable of distinguishing most cancer patients and normal people. Furthermore, except for the lowly expressed miR-6875-5p in lung cancer tissue, the other 4 miRNAs all expressed highly in cancer patients. Performance analysis revealed that their AUC values were 0.92, 0.96, 0.94, 0.95, and 0.93, respectively.</p><p><strong>Conclusion: </strong>By and large, the 5 feature miRNAs screened here were anticipated to be effective biomarkers for lung cancer.</p>\",\"PeriodicalId\":49650,\"journal\":{\"name\":\"Public Health Genomics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2022-08-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Public Health Genomics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1159/000525316\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Public Health Genomics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1159/000525316","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
Screening of Serum miRNAs as Diagnostic Biomarkers for Lung Cancer Using the Minimal-Redundancy-Maximal-Relevance Algorithm and Random Forest Classifier Based on a Public Database.
Background: Lung cancer is one of the deadliest cancers, early diagnosis of which can efficiently enhance patient's survival. We aimed to screening out the serum miRNAs as diagnostic biomarkers for patients with lung cancer.
Methods: A total of 416 remarkably differentially expressed miRNAs were acquired using the limma package, and next feature ranking was derived by the minimal-redundancy-maximal-relevance method. An incremental feature selection algorithm of a random forest (RF) classifier was utilized to choose the top 5 miRNA combination with the optimum predictive performance. The performance of the RF classifier of top 5 miRNAs was analyzed using the receiver operator characteristic (ROC) curve. Afterward, the classification effect of the 5-miRNA combination was validated through principal component analysis and hierarchical clustering analysis. Analysis of top 5 miRNA expressions between lung cancer patients and normal people was performed based on GSE137140 dataset, and their expression was validated by qPCR. The hierarchical clustering analysis was used to analyze the similarity of 5 miRNAs expression profiles. ROC analysis was undertaken on each miRNA.
Results: We acquired top 5 miRNAs finally, with the Matthews correlation coefficient value as 0.988 and the area under the curve (AUC) value as 0.996. The 5 feature miRNAs were capable of distinguishing most cancer patients and normal people. Furthermore, except for the lowly expressed miR-6875-5p in lung cancer tissue, the other 4 miRNAs all expressed highly in cancer patients. Performance analysis revealed that their AUC values were 0.92, 0.96, 0.94, 0.95, and 0.93, respectively.
Conclusion: By and large, the 5 feature miRNAs screened here were anticipated to be effective biomarkers for lung cancer.
期刊介绍:
''Public Health Genomics'' is the leading international journal focusing on the timely translation of genome-based knowledge and technologies into public health, health policies, and healthcare as a whole. This peer-reviewed journal is a bimonthly forum featuring original papers, reviews, short communications, and policy statements. It is supplemented by topic-specific issues providing a comprehensive, holistic and ''all-inclusive'' picture of the chosen subject. Multidisciplinary in scope, it combines theoretical and empirical work from a range of disciplines, notably public health, molecular and medical sciences, the humanities and social sciences. In so doing, it also takes into account rapid scientific advances from fields such as systems biology, microbiomics, epigenomics or information and communication technologies as well as the hight potential of ''big data'' for public health.