Karthick Kanagarathinam, R. Manikandan, T. S. Kumar
This research explores the application of machine learning (ML)-based risk prediction models in early diabetes disease detection for healthcare professionals. Diabetes affects millions of people worldwide. In light of significant advancements in biomedical sciences, vast volumes of data have been generated, including high-throughput genetic and diagnostic data sourced from extensive health records. Leveraging an initial diabetes risk prediction dataset from the University of California Irvine (UCI) ML repository, our research focused on supervised learning techniques, constituting 85% of the employed methods. The remaining 15% comprised unsupervised learning approaches, specifically association rules. A key contribution of this study lies in the development of an optimal prediction model utilizing supervised ML algorithms. The Boruta feature selection algorithm was employed to identify pertinent features, and the subsequent models were validated using a preprocessed dataset containing 10 attributes. Notably, the risk prediction models generated through random forest, extreme gradient boosting (XGBoost), and light gradient boosting machine (LightGBM) exhibited impressive average accuracies of 98.13%, 97.37%, and 97.22%, respectively, as determined via 10-fold cross-validation with 15 repetitions. Furthermore, these models achieved exceptional area under the ROC curve (AUC) values of 1, 0.99, and 0.99, respectively, showcasing their robustness and efficacy in diabetes risk prediction.
{"title":"Machine learning algorithms-based decision support model for diabetes","authors":"Karthick Kanagarathinam, R. Manikandan, T. S. Kumar","doi":"10.18488/76.v11i1.3598","DOIUrl":"https://doi.org/10.18488/76.v11i1.3598","url":null,"abstract":"This research explores the application of machine learning (ML)-based risk prediction models in early diabetes disease detection for healthcare professionals. Diabetes affects millions of people worldwide. In light of significant advancements in biomedical sciences, vast volumes of data have been generated, including high-throughput genetic and diagnostic data sourced from extensive health records. Leveraging an initial diabetes risk prediction dataset from the University of California Irvine (UCI) ML repository, our research focused on supervised learning techniques, constituting 85% of the employed methods. The remaining 15% comprised unsupervised learning approaches, specifically association rules. A key contribution of this study lies in the development of an optimal prediction model utilizing supervised ML algorithms. The Boruta feature selection algorithm was employed to identify pertinent features, and the subsequent models were validated using a preprocessed dataset containing 10 attributes. Notably, the risk prediction models generated through random forest, extreme gradient boosting (XGBoost), and light gradient boosting machine (LightGBM) exhibited impressive average accuracies of 98.13%, 97.37%, and 97.22%, respectively, as determined via 10-fold cross-validation with 15 repetitions. Furthermore, these models achieved exceptional area under the ROC curve (AUC) values of 1, 0.99, and 0.99, respectively, showcasing their robustness and efficacy in diabetes risk prediction.","PeriodicalId":507768,"journal":{"name":"Review of Computer Engineering Research","volume":"25 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139534230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper investigates software reliability prediction by using ensemble learning with random hyperparameter optimization. Software reliability is a significant problem with software quality that developers face. It involves accurately predicting the next failure. In recent years, machine learning techniques and ensemble learning approaches have been applied to improve software reliability prediction. These approaches aim to analyze historical data and develop models that can accurately forecast when failures are likely to occur. The article proposes an ensemble learning regression model using Ridge, Bayesian Ridge, Support Vector Regressor (SVR), K-Nearest Neighbors Algorithm (KNN), Regression tree, Random Forest, Neural network, and Decision Tree as base learners. Ridge is used as a combiner model. Each base learner hyperparameter is tuned using a random search algorithm automatically. A random hyperparameter search optimization algorithm selects the hyperparameter and adjusts it for overfitting and underfitting. The base models are tuned to minimize bias and variance. The performances of the models are evaluated using standard error measures such as Mean Squared Error (MSE), Sum of Squared Error (SSE), and Normalized Root Mean Square Error (NRMSE). The proposed ensemble model is compared with existing models using a benchmark dataset. The Iyer,and Lee, and Musa datasets are used for the experiment. The dataset is scaled using standard methods like logarithmic scaling, lagging, and linear interpolation. The results of the statistical comparison show better performance by our proposed model as compared to existing models.
本文研究了利用随机超参数优化的集合学习进行软件可靠性预测的方法。软件可靠性是开发人员面临的一个重要的软件质量问题。它涉及准确预测下一次故障。近年来,机器学习技术和集合学习方法已被用于改进软件可靠性预测。这些方法旨在分析历史数据,并开发能准确预测故障可能发生时间的模型。文章提出了一种集合学习回归模型,使用 Ridge、贝叶斯 Ridge、支持向量回归算法(SVR)、K-近邻算法(KNN)、回归树、随机森林、神经网络和决策树作为基础学习器。Ridge 被用作组合模型。每个基础学习器的超参数都是通过随机搜索算法自动调整的。随机超参数搜索优化算法会选择超参数,并针对过拟合和欠拟合情况进行调整。对基本模型进行调整,以尽量减少偏差和方差。模型的性能使用标准误差指标进行评估,如均方误差(MSE)、平方误差之和(SSE)和归一化均方根误差(NRMSE)。利用基准数据集将所提出的集合模型与现有模型进行比较。实验使用的是 Iyer、Lee 和 Musa 数据集。数据集采用对数缩放、滞后和线性插值等标准方法进行缩放。统计比较结果表明,与现有模型相比,我们提出的模型性能更好。
{"title":"Software reliability prediction using ensemble learning with random hyperparameter optimization","authors":"G. Habtemariam, Sudhir Kumar Mohapatra, H. Seid","doi":"10.18488/76.v11i1.3597","DOIUrl":"https://doi.org/10.18488/76.v11i1.3597","url":null,"abstract":"The paper investigates software reliability prediction by using ensemble learning with random hyperparameter optimization. Software reliability is a significant problem with software quality that developers face. It involves accurately predicting the next failure. In recent years, machine learning techniques and ensemble learning approaches have been applied to improve software reliability prediction. These approaches aim to analyze historical data and develop models that can accurately forecast when failures are likely to occur. The article proposes an ensemble learning regression model using Ridge, Bayesian Ridge, Support Vector Regressor (SVR), K-Nearest Neighbors Algorithm (KNN), Regression tree, Random Forest, Neural network, and Decision Tree as base learners. Ridge is used as a combiner model. Each base learner hyperparameter is tuned using a random search algorithm automatically. A random hyperparameter search optimization algorithm selects the hyperparameter and adjusts it for overfitting and underfitting. The base models are tuned to minimize bias and variance. The performances of the models are evaluated using standard error measures such as Mean Squared Error (MSE), Sum of Squared Error (SSE), and Normalized Root Mean Square Error (NRMSE). The proposed ensemble model is compared with existing models using a benchmark dataset. The Iyer,and Lee, and Musa datasets are used for the experiment. The dataset is scaled using standard methods like logarithmic scaling, lagging, and linear interpolation. The results of the statistical comparison show better performance by our proposed model as compared to existing models.","PeriodicalId":507768,"journal":{"name":"Review of Computer Engineering Research","volume":"63 36","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139534770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pneumonia and tuberculosis are the major public health problems worldwide. These diseases affect the lungs, and if they are not diagnosed properly in time, they can become a fatal health problem. Chest x-ray images are widely used to detect and diagnose Pneumonia and Tuberculosis disease. Detection of Pneumonia and Tuberculosis from chest x-ray images is difficult and requires experience due to the similar pathological features of the diseases. Sometimes a misdiagnosis of the disease occurs due to this similarity. Several researchers used deep learning and machine learning techniques to solve this misdiagnosis problem. However, these studies used the chest x-ray images only to develop Pneumonia and Tuberculosis disease detection models. But using the chest x-ray images alone cannot necessarily lead to accurate disease detection and classification. In the traditional or manual approach, medical records are required to support and correctly interpret the chest x-ray images in the appropriate clinical context. This study develops a multi-input Pneumonia and Tuberculosis detection model using chest x-ray images and medical records to follow the clinical procedure. The study applied a Convolutional Neural Network for the chest x-ray image data and a Multilayer perceptron for the medical record data to develop the models. We implemented feature-level concatenation to join the output feature vectors from the Convolutional Neural Network and a Multilayer perceptron for the development of the disease detection model. For the purpose of comparison, we also developed image-only and medical record-only models. Consequently, the image-only model gives an accuracy of 92.68%, the medical record-only model results in 98.72% accuracy, and the combined model accuracy is improved to 99.61%. In general, the study shows that the fusion of the chest x-ray and the medical records leads to better accuracy and is more similar to the clinical approach.
肺炎和肺结核是全球主要的公共卫生问题。这些疾病会影响肺部,如果不能及时得到正确诊断,就会成为致命的健康问题。胸部 X 光图像被广泛用于检测和诊断肺炎和肺结核疾病。由于肺炎和肺结核的病理特征相似,因此从胸部 X 光图像检测这两种疾病非常困难,而且需要经验。有时,这种相似性会导致疾病的误诊。一些研究人员使用深度学习和机器学习技术来解决这一误诊问题。不过,这些研究仅使用胸部 X 光图像来开发肺炎和肺结核疾病检测模型。但是,仅使用胸部 X 光图像并不一定能实现准确的疾病检测和分类。在传统或人工方法中,需要医疗记录的支持,并在适当的临床背景下正确解读胸部 X 光图像。本研究利用胸部 X 光图像和医疗记录开发了一个多输入肺炎和肺结核检测模型,以遵循临床程序。该研究对胸部 X 光图像数据采用卷积神经网络,对医疗记录数据采用多层感知器来开发模型。我们采用了特征级连接技术,将卷积神经网络和多层感知器的输出特征向量连接起来,以建立疾病检测模型。为了进行比较,我们还开发了纯图像模型和纯病历模型。结果,纯图像模型的准确率为 92.68%,纯病历模型的准确率为 98.72%,综合模型的准确率提高到 99.61%。总的来说,研究表明,胸部 X 光片和医疗记录的融合能带来更好的准确性,并且更接近临床方法。
{"title":"Pneumonia and tuberculosis detection with chest x-ray images and medical records using deep learning techniques","authors":"Sudhir Kumar Mohapatra, Mesfin Abebe, Lidia Mekuanint, Srinivas Prasad, Prasanta Kumar Bala, Sunil Kumar Dhala","doi":"10.18488/76.v10i4.3533","DOIUrl":"https://doi.org/10.18488/76.v10i4.3533","url":null,"abstract":"Pneumonia and tuberculosis are the major public health problems worldwide. These diseases affect the lungs, and if they are not diagnosed properly in time, they can become a fatal health problem. Chest x-ray images are widely used to detect and diagnose Pneumonia and Tuberculosis disease. Detection of Pneumonia and Tuberculosis from chest x-ray images is difficult and requires experience due to the similar pathological features of the diseases. Sometimes a misdiagnosis of the disease occurs due to this similarity. Several researchers used deep learning and machine learning techniques to solve this misdiagnosis problem. However, these studies used the chest x-ray images only to develop Pneumonia and Tuberculosis disease detection models. But using the chest x-ray images alone cannot necessarily lead to accurate disease detection and classification. In the traditional or manual approach, medical records are required to support and correctly interpret the chest x-ray images in the appropriate clinical context. This study develops a multi-input Pneumonia and Tuberculosis detection model using chest x-ray images and medical records to follow the clinical procedure. The study applied a Convolutional Neural Network for the chest x-ray image data and a Multilayer perceptron for the medical record data to develop the models. We implemented feature-level concatenation to join the output feature vectors from the Convolutional Neural Network and a Multilayer perceptron for the development of the disease detection model. For the purpose of comparison, we also developed image-only and medical record-only models. Consequently, the image-only model gives an accuracy of 92.68%, the medical record-only model results in 98.72% accuracy, and the combined model accuracy is improved to 99.61%. In general, the study shows that the fusion of the chest x-ray and the medical records leads to better accuracy and is more similar to the clinical approach.","PeriodicalId":507768,"journal":{"name":"Review of Computer Engineering Research","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139227160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}