Diabetes, a long-term metabolic disorder, causes persistently high blood sugar and presents a significant global health challenge. Early diagnosis is of vital importance in mitigating the effects of diabetes. This study aims to investigate diabetes diagnosis and risk prediction using a comprehensive diabetes dataset created in 2023. The dataset contains clinical and anthropometric data of patients. Data simplification was successfully applied to clean unnecessary information and reduce data dimensionality. Additionally, methods like Principal Component Analysis were applied to decrease the number of variables in the dataset. These analyses rendered the dataset more manageable and improved its performance. In this study, a dataset encompassing health data of a total of 100,000 individuals was utilized. This dataset consists of 8 input features and 1 output feature. The primary objective is to determine the algorithm that exhibits the best performance for diabetes diagnosis. There was no missing data during the data preprocessing stage, and the necessary transformations were carried out successfully. Nine different machine learning algorithms were applied to the dataset in this study. Each algorithm employed various modelling approaches to evaluate its performance in diagnosing diabetes. The results demonstrate that machine learning models are successful in predicting the presence of diabetes and the risk of developing it in healthy individuals. Particularly, the random forest model provided superior results across all performance metrics. This study provides significant findings that can shed light on future research in diabetes diagnosis and risk prediction. Dimensionality reduction techniques have proven to be valuable in data analysis and have highlighted the potential to facilitate diabetes diagnosis, thereby enhancing the quality of life for patients.
{"title":"Effect of dimension reduction with PCA and machine learning algorithms on diabetes diagnosis performance","authors":"Yavuz Bahadir Koca, Elif Aktepe","doi":"10.31127/tuje.1413087","DOIUrl":"https://doi.org/10.31127/tuje.1413087","url":null,"abstract":"Diabetes, a long-term metabolic disorder, causes persistently high blood sugar and presents a significant global health challenge. Early diagnosis is of vital importance in mitigating the effects of diabetes. This study aims to investigate diabetes diagnosis and risk prediction using a comprehensive diabetes dataset created in 2023. The dataset contains clinical and anthropometric data of patients. Data simplification was successfully applied to clean unnecessary information and reduce data dimensionality. Additionally, methods like Principal Component Analysis were applied to decrease the number of variables in the dataset. These analyses rendered the dataset more manageable and improved its performance. In this study, a dataset encompassing health data of a total of 100,000 individuals was utilized. This dataset consists of 8 input features and 1 output feature. The primary objective is to determine the algorithm that exhibits the best performance for diabetes diagnosis. There was no missing data during the data preprocessing stage, and the necessary transformations were carried out successfully. Nine different machine learning algorithms were applied to the dataset in this study. Each algorithm employed various modelling approaches to evaluate its performance in diagnosing diabetes. The results demonstrate that machine learning models are successful in predicting the presence of diabetes and the risk of developing it in healthy individuals. Particularly, the random forest model provided superior results across all performance metrics. This study provides significant findings that can shed light on future research in diabetes diagnosis and risk prediction. Dimensionality reduction techniques have proven to be valuable in data analysis and have highlighted the potential to facilitate diabetes diagnosis, thereby enhancing the quality of life for patients.","PeriodicalId":518565,"journal":{"name":"Turkish Journal of Engineering","volume":" 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141674291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study aims to use a machine learning (ML) model to accurately classify four datasets of cotton crop leaves as either infected or healthy. Bacterial blight, Curly virus, Fussarium Wilt, and healthy leaves were used as the datasets for the study. ML is a useful tool in detecting cotton leaf diseases and can minimize the rate of disease. The problem is that without machine learning technique it is very difficult and time consuming to detect the diseases then to sort out this problem a machine learning model is proposed and to test the accuracy of the proposed model, the confusion matrix concept was used. The researchers have done their research works to diagnose the diseases by using (ML) model but the drawback of their research was that the results which were given by the different (ML) models were not accurate. The target of the study was to identify diseases affecting the cotton plant in the early stages using traditional techniques. However, utilizing various image processing techniques and machine learning algorithms, including a convolutional neural network, proved to be helpful in diagnosing the diseases. This technological approach can simplify the detection of damaged leaves and minimize the efforts of farmers in detecting those diseases. Cotton is a natural fiber produced on a large scale, and it is grown on 2.5% of overall agronomic land. The detection of cotton leaf diseases is crucial to maintain the crop's productivity and provide reliable earnings to farmers. A confusion matrix is N X N matrix used for evaluating the performance of a classification model, where N is the number of target classes. The matrix compares the actual target values with those predicted by machine learning model. This technique has four parameters to test the accuracy of the results which is given in my research work.
本研究旨在利用机器学习(ML)模型将四种数据集的棉花作物叶片准确地分为感染叶片和健康叶片。细菌性枯萎病、卷曲病毒、镰刀菌枯萎病和健康叶片被用作研究数据集。ML 是检测棉花叶片病害的有用工具,可以最大限度地降低病害发生率。问题在于,如果没有机器学习技术,检测病害就会非常困难和耗时,为了解决这个问题,我们提出了一个机器学习模型,并使用混淆矩阵概念来测试所提模型的准确性。研究人员已经完成了使用(ML)模型诊断疾病的研究工作,但他们研究的缺点是不同(ML)模型给出的结果并不准确。这项研究的目标是利用传统技术在早期阶段识别棉花植株的病害。然而,事实证明,利用各种图像处理技术和机器学习算法(包括卷积神经网络)有助于诊断疾病。这种技术方法可以简化受损叶片的检测工作,最大程度地减少农民检测这些病害的工作量。棉花是一种大规模生产的天然纤维,其种植面积占农田总面积的 2.5%。棉花叶片病害的检测对于保持作物产量和为农民提供可靠收益至关重要。混淆矩阵是用于评估分类模型性能的 N X N 矩阵,其中 N 是目标类别的数量。矩阵将实际目标值与机器学习模型的预测值进行比较。这项技术有四个参数来测试结果的准确性,我在研究工作中给出了这四个参数。
{"title":"Detection of cotton leaf disease with machine learning model","authors":"Unain Hyder, Mir Rahib Hussain","doi":"10.31127/tuje.1406755","DOIUrl":"https://doi.org/10.31127/tuje.1406755","url":null,"abstract":"This study aims to use a machine learning (ML) model to accurately classify four datasets of cotton crop leaves as either infected or healthy. Bacterial blight, Curly virus, Fussarium Wilt, and healthy leaves were used as the datasets for the study. ML is a useful tool in detecting cotton leaf diseases and can minimize the rate of disease. The problem is that without machine learning technique it is very difficult and time consuming to detect the diseases then to sort out this problem a machine learning model is proposed and to test the accuracy of the proposed model, the confusion matrix concept was used. The researchers have done their research works to diagnose the diseases by using (ML) model but the drawback of their research was that the results which were given by the different (ML) models were not accurate. The target of the study was to identify diseases affecting the cotton plant in the early stages using traditional techniques. However, utilizing various image processing techniques and machine learning algorithms, including a convolutional neural network, proved to be helpful in diagnosing the diseases. This technological approach can simplify the detection of damaged leaves and minimize the efforts of farmers in detecting those diseases. Cotton is a natural fiber produced on a large scale, and it is grown on 2.5% of overall agronomic land. The detection of cotton leaf diseases is crucial to maintain the crop's productivity and provide reliable earnings to farmers. A confusion matrix is N X N matrix used for evaluating the performance of a classification model, where N is the number of target classes. The matrix compares the actual target values with those predicted by machine learning model. This technique has four parameters to test the accuracy of the results which is given in my research work.","PeriodicalId":518565,"journal":{"name":"Turkish Journal of Engineering","volume":" 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140686711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gasoline is one of the most sought-after resources in the world, where the need for energy is indispensable and continuously increasing for human life today. A shortage of gasoline may negatively affect the economies of countries. Therefore, analyzes and estimates about gasoline consumption are critical. Better forecast performance on gasoline consumption can serve the policymakers, managers, researchers, and other gasoline sector stakeholders. Parallel to the world economy, gasoline consumption in Turkey is among the top among the most consumed energy source. Therefore, it is aimed at forecasting the amount of daily gasoline consumption in Turkey in this study. For this purpose, a lasso regression-based forecasting methodology is proposed. The forecasting approach used for daily gasoline consumption consisting of 3 main stages: i) cleaning the data ii) extracting and selecting features iii) forecasting the future of daily gasoline consumption time series via the proposed models. Besides, Ridge Regression is used to compare the performance of the proposed model.
{"title":"A lasso regression-based forecasting model for daily gasoline consumption: Türkiye Case","authors":"Ertugrul Ayyıldız, Mirac Murat","doi":"10.31127/tuje.1354501","DOIUrl":"https://doi.org/10.31127/tuje.1354501","url":null,"abstract":"Gasoline is one of the most sought-after resources in the world, where the need for energy is indispensable and continuously increasing for human life today. A shortage of gasoline may negatively affect the economies of countries. Therefore, analyzes and estimates about gasoline consumption are critical. Better forecast performance on gasoline consumption can serve the policymakers, managers, researchers, and other gasoline sector stakeholders. Parallel to the world economy, gasoline consumption in Turkey is among the top among the most consumed energy source. Therefore, it is aimed at forecasting the amount of daily gasoline consumption in Turkey in this study. For this purpose, a lasso regression-based forecasting methodology is proposed. The forecasting approach used for daily gasoline consumption consisting of 3 main stages: i) cleaning the data ii) extracting and selecting features iii) forecasting the future of daily gasoline consumption time series via the proposed models. Besides, Ridge Regression is used to compare the performance of the proposed model.","PeriodicalId":518565,"journal":{"name":"Turkish Journal of Engineering","volume":"68 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140532042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}