{"title":"Ensemble Classification Model With CFS-IGWO-Based Feature Selection for Cancer Detection Using Microarray Data.","authors":"Pinakshi Panda, Sukant Kishoro Bisoy, Sandeep Kautish, Reyaz Ahmad, Asma Irshad, Nadeem Sarwar","doi":"10.1155/2024/4105224","DOIUrl":null,"url":null,"abstract":"<p><p>Cancer is the top cause of death worldwide, and machine learning (ML) has made an indelible mark on the field of early cancer detection, thereby lowering the death toll. ML-based model for cancer diagnosis is done using two forms of data: gene expression data and microarray data. The data on gene expression levels includes many dimensions. When dealing with data with a high dimension, the efficiency of an ML-based model is decreased. Microarray data is distinguished by its high dimensionality with a greater number of features and a smaller sample size. In this work, two ensemble techniques are proposed using majority voting technique and weighted average technique. Correlation feature selection (CFS) is used for feature selection, and improved grey wolf optimizer (IGWO) is used for feature optimization. Support vector machines (SVMs), multilayer perceptron (MLP) classification, logistic regression (LR), decision tree (DT), adaptive boosting (AdaBoost) classifier, extreme learning machines (ELMs), and K-nearest neighbor (KNN) are used as classifiers. The results of each distinct base learner were then combined using weighted average and majority voting ensemble methods. Accuracy (ACC), specificity (SPE), sensitivity (SEN), precision (PRE), Matthews correlation coefficient (MCC), and F1-score (F1-S) are used to assess the performance. Our result shows that majority voting achieves better performance than the weighted average ensemble technique.</p>","PeriodicalId":45630,"journal":{"name":"International Journal of Telemedicine and Applications","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11502127/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Telemedicine and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2024/4105224","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Cancer is the top cause of death worldwide, and machine learning (ML) has made an indelible mark on the field of early cancer detection, thereby lowering the death toll. ML-based model for cancer diagnosis is done using two forms of data: gene expression data and microarray data. The data on gene expression levels includes many dimensions. When dealing with data with a high dimension, the efficiency of an ML-based model is decreased. Microarray data is distinguished by its high dimensionality with a greater number of features and a smaller sample size. In this work, two ensemble techniques are proposed using majority voting technique and weighted average technique. Correlation feature selection (CFS) is used for feature selection, and improved grey wolf optimizer (IGWO) is used for feature optimization. Support vector machines (SVMs), multilayer perceptron (MLP) classification, logistic regression (LR), decision tree (DT), adaptive boosting (AdaBoost) classifier, extreme learning machines (ELMs), and K-nearest neighbor (KNN) are used as classifiers. The results of each distinct base learner were then combined using weighted average and majority voting ensemble methods. Accuracy (ACC), specificity (SPE), sensitivity (SEN), precision (PRE), Matthews correlation coefficient (MCC), and F1-score (F1-S) are used to assess the performance. Our result shows that majority voting achieves better performance than the weighted average ensemble technique.
癌症是全球第一大死因,而机器学习(ML)在早期癌症检测领域留下了不可磨灭的印记,从而降低了死亡人数。基于 ML 的癌症诊断模型使用两种数据形式:基因表达数据和微阵列数据。基因表达水平数据包括许多维度。在处理高维度数据时,基于 ML 的模型的效率会降低。微阵列数据的特点是维度高、特征多、样本量小。在这项工作中,提出了使用多数投票技术和加权平均技术的两种集合技术。相关特征选择(CFS)用于特征选择,改进灰狼优化器(IGWO)用于特征优化。支持向量机(SVM)、多层感知器(MLP)分类、逻辑回归(LR)、决策树(DT)、自适应提升(AdaBoost)分类器、极限学习机(ELM)和 K 近邻(KNN)被用作分类器。然后使用加权平均法和多数投票法将每个不同基础学习器的结果进行组合。准确度 (ACC)、特异度 (SPE)、灵敏度 (SEN)、精确度 (PRE)、马修斯相关系数 (MCC) 和 F1 分数 (F1-S) 用于评估性能。结果表明,多数投票法比加权平均集合技术取得了更好的性能。
期刊介绍:
The overall aim of the International Journal of Telemedicine and Applications is to bring together science and applications of medical practice and medical care at a distance as well as their supporting technologies such as, computing, communications, and networking technologies with emphasis on telemedicine techniques and telemedicine applications. It is directed at practicing engineers, academic researchers, as well as doctors, nurses, etc. Telemedicine is an information technology that enables doctors to perform medical consultations, diagnoses, and treatments, as well as medical education, away from patients. For example, doctors can remotely examine patients via remote viewing monitors and sound devices, and/or sampling physiological data using telecommunication. Telemedicine technology is applied to areas of emergency healthcare, videoconsulting, telecardiology, telepathology, teledermatology, teleophthalmology, teleoncology, telepsychiatry, teledentistry, etc. International Journal of Telemedicine and Applications will highlight the continued growth and new challenges in telemedicine, applications, and their supporting technologies, for both application development and basic research. Papers should emphasize original results or case studies relating to the theory and/or applications of telemedicine. Tutorial papers, especially those emphasizing multidisciplinary views of telemedicine, are also welcome. International Journal of Telemedicine and Applications employs a paperless, electronic submission and evaluation system to promote a rapid turnaround in the peer-review process.