基于差分进化的微阵列数据分类特征选择

Discover Internet of Things Pub Date : 2023-10-05 DOI:10.1007/s43926-023-00042-5

Sanjay Prajapati, Himansu Das, Mahendra Kumar Gourisaria

{"title":"基于差分进化的微阵列数据分类特征选择","authors":"Sanjay Prajapati, Himansu Das, Mahendra Kumar Gourisaria","doi":"10.1007/s43926-023-00042-5","DOIUrl":null,"url":null,"abstract":"Abstract The dimensions of microarray datasets are very large, containing noise and redundancy. The problem with microarray datasets is the presence of more features compared to the number of samples, which adversely affects algorithm performance. In other words, the number of columns exceeds the number of rows. Therefore, to extract precise information from microarray datasets, a robust technique is required. Microarray datasets play a critical role in detecting various diseases, including cancer and tumors. This is where feature selection techniques come into play. In recent times, feature selection (FS) has gained significant importance as a data preparation method, particularly for high-dimensional data. It is preferable to address classification problems with fewer features while maintaining high accuracy, as not all features are necessary to achieve this goal. The primary objective of feature selection is to identify the optimal subset of features. In this context, we will employ the Differential Evolution (DE) algorithm. DE is a population-based stochastic search approach that has found widespread use in various scientific and technical domains to solve optimization problems in continuous spaces. In our approach, we will combine DE with three different classification algorithms: Random Forest (RF), Decision Tree (DT), and Logistic Regression (LR). Our analysis will include a comparison of the accuracy achieved by each algorithmic model on each dataset, as well as the fitness error for each model. The results indicate that when feature selection was used the results were better compared to the results where the feature selection was not used.","PeriodicalId":34751,"journal":{"name":"Discover Internet of Things","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Feature selection using differential evolution for microarray data classification\",\"authors\":\"Sanjay Prajapati, Himansu Das, Mahendra Kumar Gourisaria\",\"doi\":\"10.1007/s43926-023-00042-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract The dimensions of microarray datasets are very large, containing noise and redundancy. The problem with microarray datasets is the presence of more features compared to the number of samples, which adversely affects algorithm performance. In other words, the number of columns exceeds the number of rows. Therefore, to extract precise information from microarray datasets, a robust technique is required. Microarray datasets play a critical role in detecting various diseases, including cancer and tumors. This is where feature selection techniques come into play. In recent times, feature selection (FS) has gained significant importance as a data preparation method, particularly for high-dimensional data. It is preferable to address classification problems with fewer features while maintaining high accuracy, as not all features are necessary to achieve this goal. The primary objective of feature selection is to identify the optimal subset of features. In this context, we will employ the Differential Evolution (DE) algorithm. DE is a population-based stochastic search approach that has found widespread use in various scientific and technical domains to solve optimization problems in continuous spaces. In our approach, we will combine DE with three different classification algorithms: Random Forest (RF), Decision Tree (DT), and Logistic Regression (LR). Our analysis will include a comparison of the accuracy achieved by each algorithmic model on each dataset, as well as the fitness error for each model. The results indicate that when feature selection was used the results were better compared to the results where the feature selection was not used.\",\"PeriodicalId\":34751,\"journal\":{\"name\":\"Discover Internet of Things\",\"volume\":\"78 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Discover Internet of Things\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s43926-023-00042-5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Discover Internet of Things","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s43926-023-00042-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

微阵列数据集的维数非常大，包含噪声和冗余。微阵列数据集的问题是与样本数量相比存在更多的特征，这对算法性能产生不利影响。换句话说，列数超过行数。因此，为了从微阵列数据集中提取精确的信息，需要一种强大的技术。微阵列数据集在检测包括癌症和肿瘤在内的各种疾病方面发挥着关键作用。这就是特征选择技术发挥作用的地方。近年来，特征选择(FS)作为一种数据准备方法变得越来越重要，特别是对于高维数据。最好是在保持高准确性的同时使用更少的特征来解决分类问题，因为不是所有的特征都是实现这一目标所必需的。特征选择的主要目标是识别出最优的特征子集。在这种情况下，我们将采用差分进化(DE)算法。DE是一种基于种群的随机搜索方法，广泛应用于各种科学和技术领域，用于解决连续空间中的优化问题。在我们的方法中，我们将DE与三种不同的分类算法相结合:随机森林(RF)，决策树(DT)和逻辑回归(LR)。我们的分析将包括每个算法模型在每个数据集上实现的精度的比较，以及每个模型的适应度误差。结果表明，使用特征选择的结果比不使用特征选择的结果要好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Feature selection using differential evolution for microarray data classification

Abstract The dimensions of microarray datasets are very large, containing noise and redundancy. The problem with microarray datasets is the presence of more features compared to the number of samples, which adversely affects algorithm performance. In other words, the number of columns exceeds the number of rows. Therefore, to extract precise information from microarray datasets, a robust technique is required. Microarray datasets play a critical role in detecting various diseases, including cancer and tumors. This is where feature selection techniques come into play. In recent times, feature selection (FS) has gained significant importance as a data preparation method, particularly for high-dimensional data. It is preferable to address classification problems with fewer features while maintaining high accuracy, as not all features are necessary to achieve this goal. The primary objective of feature selection is to identify the optimal subset of features. In this context, we will employ the Differential Evolution (DE) algorithm. DE is a population-based stochastic search approach that has found widespread use in various scientific and technical domains to solve optimization problems in continuous spaces. In our approach, we will combine DE with three different classification algorithms: Random Forest (RF), Decision Tree (DT), and Logistic Regression (LR). Our analysis will include a comparison of the accuracy achieved by each algorithmic model on each dataset, as well as the fitness error for each model. The results indicate that when feature selection was used the results were better compared to the results where the feature selection was not used.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Discover Internet of Things Internet of Things (IoT)-

CiteScore

7.50

自引率

0.00%

发文量

审稿时长

28 days

期刊介绍： Discover Internet of Things is part of the Discover journal series committed to providing a streamlined submission process, rapid review and publication, and a high level of author service at every stage. It is an open access, community-focussed journal publishing research from across all fields relevant to the Internet of Things (IoT), providing cutting-edge and state-of-art research findings to researchers, academicians, students, and engineers. Discover Internet of Things is a broad, open access journal publishing research from across all fields relevant to IoT. Discover Internet of Things covers concepts at the component, hardware, and system level as well as programming, operating systems, software, applications and other technology-oriented research topics. The journal is uniquely interdisciplinary because its scope spans several research communities, ranging from computer systems to communication, optimisation, big data analytics, and application. It is also intended that articles published in Discover Internet of Things may help to support and accelerate Sustainable Development Goal 9: ‘Build resilient infrastructure, promote inclusive and sustainable industrialization and foster innovation’. Discover Internet of Things welcomes all observational, experimental, theoretical, analytical, mathematical modelling, data-driven, and applied approaches that advance the study of all aspects of IoT research.