With the introduction of Industry 4.0 into our lives and the creation of smart factories, predictive maintenance has become even more important. Predictive maintenance systems are often used in the manufacturing industry. On the other hand, text analysis and Natural Language Processing (NLP) techniques are gaining a lot of attention by both research and industry due to their ability to combine natural languages and industrial solutions. There is a great increase in the number of studies on NLP in the literature. Even though there are studies in the field of NLP in predictive maintenance systems, no studies were found on Turkish NLP for predictive maintenance. This study focuses on the similarity analysis of failure texts that can be used in the predictive maintenance system we developed for VESTEL, one of the leading consumer electronics manufacturers in Turkey. In the manufacturing industry, operators record descriptions of failure that occur on production lines as short texts. However, these descriptions are not often used in predictive maintenance work. In this study, semantic text similarities between fault definitions in the production line were compared using traditional word representations, modern word representations and Transformer models. Levenshtein, Jaccard, Pearson, and Cosine scales were used as similarity measures and the effectiveness of these measures were compared. Experimental data including failure texts were obtained from a consumer electronics manufacturer in Turkey. When the experimental results are examined, it is seen that the Jaccard similarity metric is not successful in grouping semantic similarities according to the other three similarity measures. In addition, Multilingual Universal Sentence Encoder (MUSE), Language-agnostic BERT Sentence Embedding (LAbSE), Bag of Words (BoW) and Term Frequency - Inverse Document Frequency (TF-IDF) outperform FastText and Language-Agnostic Sentence Representations (LASER) models in semantic discovery of error identification in embedding methods. Briefly to conclude, Pearson and Cosine are more effective at finding similar failure texts; MUSE, LAbSE, BoW and TF-IDF methods are more successful at representing the failure text.
{"title":"Semantic Similarity Comparison Between Production Line Failures for Predictive Maintenance","authors":"Hilal Tekgöz, Sevinç İlhan Omurca, Kadir Koc, Umut Topçu, Osman Çeli̇k","doi":"10.54569/aair.1142568","DOIUrl":"https://doi.org/10.54569/aair.1142568","url":null,"abstract":"With the introduction of Industry 4.0 into our lives and the creation of smart factories, predictive maintenance has become even more important. Predictive maintenance systems are often used in the manufacturing industry. On the other hand, text analysis and Natural Language Processing (NLP) techniques are gaining a lot of attention by both research and industry due to their ability to combine natural languages and industrial solutions. There is a great increase in the number of studies on NLP in the literature. Even though there are studies in the field of NLP in predictive maintenance systems, no studies were found on Turkish NLP for predictive maintenance. This study focuses on the similarity analysis of failure texts that can be used in the predictive maintenance system we developed for VESTEL, one of the leading consumer electronics manufacturers in Turkey. In the manufacturing industry, operators record descriptions of failure that occur on production lines as short texts. However, these descriptions are not often used in predictive maintenance work. In this study, semantic text similarities between fault definitions in the production line were compared using traditional word representations, modern word representations and Transformer models. Levenshtein, Jaccard, Pearson, and Cosine scales were used as similarity measures and the effectiveness of these measures were compared. Experimental data including failure texts were obtained from a consumer electronics manufacturer in Turkey. When the experimental results are examined, it is seen that the Jaccard similarity metric is not successful in grouping semantic similarities according to the other three similarity measures. In addition, Multilingual Universal Sentence Encoder (MUSE), Language-agnostic BERT Sentence Embedding (LAbSE), Bag of Words (BoW) and Term Frequency - Inverse Document Frequency (TF-IDF) outperform FastText and Language-Agnostic Sentence Representations (LASER) models in semantic discovery of error identification in embedding methods. Briefly to conclude, Pearson and Cosine are more effective at finding similar failure texts; MUSE, LAbSE, BoW and TF-IDF methods are more successful at representing the failure text.","PeriodicalId":286492,"journal":{"name":"Advances in Artificial Intelligence Research","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130864604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data mining is the process of extracting useful information from large-scale data in an understandable and logical way. According to the main machine learning techniques of data mining; classification and regression, association rules and cluster analysis. Classification and regression are known as predictive models, and clustering and association rules are known as descriptive models. In this study, the classification method was used. With this method, it is aimed to assign a data set to one of the previously determined different classes. The data set used in the study was obtained from the UCIrvine Machine Learning Repository database. The dataset named “Breast cancer” consists of breast cancer data consisting of 699 samples and 10 features collected by William H. at the University of Wisconsin hospital. The data content includes information about the characteristics of some cells analyzed in the detection of breast cancer, cell division, and whether they are benign or malignant. Upon completion of the study, a classification process is performed by determining whether the targeted person has cancerous or non-cancerous cells. In the study carried out in this context; Data mining analyzes were performed using WEKA and Orange programs, SVM (Support Vector Machine), Random Forest algorithms. Along with the analysis results, a comparison was made on the data set, taking into account the previous studies. It is aimed that the conclusions obtained at the end of the study will guide medical professionals working in this field in the diagnosis of breast cancer.
{"title":"Using Classification Algorithms in Data Mining in Diagnosing Breast Cancer","authors":"İrem DÜZDAR ARGUN, B. Nalbant","doi":"10.54569/aair.1142519","DOIUrl":"https://doi.org/10.54569/aair.1142519","url":null,"abstract":"Data mining is the process of extracting useful information from large-scale data in an understandable and logical way. According to the main machine learning techniques of data mining; classification and regression, association rules and cluster analysis. Classification and regression are known as predictive models, and clustering and association rules are known as descriptive models. In this study, the classification method was used. With this method, it is aimed to assign a data set to one of the previously determined different classes. The data set used in the study was obtained from the UCIrvine Machine Learning Repository database. The dataset named “Breast cancer” consists of breast cancer data consisting of 699 samples and 10 features collected by William H. at the University of Wisconsin hospital. The data content includes information about the characteristics of some cells analyzed in the detection of breast cancer, cell division, and whether they are benign or malignant. Upon completion of the study, a classification process is performed by determining whether the targeted person has cancerous or non-cancerous cells. In the study carried out in this context; Data mining analyzes were performed using WEKA and Orange programs, SVM (Support Vector Machine), Random Forest algorithms. Along with the analysis results, a comparison was made on the data set, taking into account the previous studies. It is aimed that the conclusions obtained at the end of the study will guide medical professionals working in this field in the diagnosis of breast cancer.","PeriodicalId":286492,"journal":{"name":"Advances in Artificial Intelligence Research","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125754036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heart disease is one of the most common causes of death globally. In this study, machine learning algorithms and models widely used in the literature to predict heart disease have been extensively compared, and a hybrid feature selection based on genetic algorithm and tabu search methods have been developed. The proposed system consists of three components: (1) preprocess of datasets, (2) feature selection with genetic and tabu search algorithm, and (3) classification module. The models have been tested using different datasets, and detailed comparisons and analysis were presented. The experimental results show that the Random Forest algorithm is more successful than Adaboost, Bagging, Logitboost, and Support Vector machine using Cleveland and Statlog datasets.
{"title":"Machine Learning-Based Comparative Study For Heart Disease Prediction","authors":"Merve Güllü, M. Ali Akcayol, N. Barışçı","doi":"10.54569/aair.1145616","DOIUrl":"https://doi.org/10.54569/aair.1145616","url":null,"abstract":"Heart disease is one of the most common causes of death globally. In this study, machine learning algorithms and models widely used in the literature to predict heart disease have been extensively compared, and a hybrid feature selection based on genetic algorithm and tabu search methods have been developed. The proposed system consists of three components: (1) preprocess of datasets, (2) feature selection with genetic and tabu search algorithm, and (3) classification module. The models have been tested using different datasets, and detailed comparisons and analysis were presented. The experimental results show that the Random Forest algorithm is more successful than Adaboost, Bagging, Logitboost, and Support Vector machine using Cleveland and Statlog datasets.","PeriodicalId":286492,"journal":{"name":"Advances in Artificial Intelligence Research","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116465655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For the sustainable development of nations and to lessen the negative environmental effects of fossil fuels, more clean and renewable energy sources are now required. One of the most significant energy sources is solar energy. To utilize solar energy more efficiently in a particular area, it is crucial to be aware of the solar radiation levels. Furthermore, it's critical to accurately calculate solar energy for study into climate change, one of the biggest global challenges. Systems that utilise solar energy are frequently used nowadays to address the rising global need for energy. The high geographical and temporal resolution, global, diffuse, and direct sunlight data needed for the design and effective operation of solar power plants are now provided by satellite-based solar radiation predictions. In this work, satellite-based forecasting models were used to estimate diffuse solar radiation for the chosen region. In this study, the solar radiation irradiance values of the chosen region were estimated using the curve fitting approach. Angstorm coefficients were determined using the Matlab program for this investigation. Various statistical error analysis tests were used to evaluate how well the constructed model performed. The findings collected unequivocally demonstrate that the provided prediction models perform well.
{"title":"Determination of Angstorm Coefficients with curve fitting method by using Matlab Program","authors":"A. Kaplan, Alper Kaplan","doi":"10.54569/aair.1139183","DOIUrl":"https://doi.org/10.54569/aair.1139183","url":null,"abstract":"For the sustainable development of nations and to lessen the negative environmental effects of fossil fuels, more clean and renewable energy sources are now required. One of the most significant energy sources is solar energy. To utilize solar energy more efficiently in a particular area, it is crucial to be aware of the solar radiation levels. Furthermore, it's critical to accurately calculate solar energy for study into climate change, one of the biggest global challenges. Systems that utilise solar energy are frequently used nowadays to address the rising global need for energy. The high geographical and temporal resolution, global, diffuse, and direct sunlight data needed for the design and effective operation of solar power plants are now provided by satellite-based solar radiation predictions. In this work, satellite-based forecasting models were used to estimate diffuse solar radiation for the chosen region. In this study, the solar radiation irradiance values of the chosen region were estimated using the curve fitting approach. Angstorm coefficients were determined using the Matlab program for this investigation. Various statistical error analysis tests were used to evaluate how well the constructed model performed. The findings collected unequivocally demonstrate that the provided prediction models perform well.","PeriodicalId":286492,"journal":{"name":"Advances in Artificial Intelligence Research","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128472404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Buse Nur Karaman, Zeynep Bağdatli, Nilay Nisa Taçyildiz, Sude Çi̇ğni̇taş, Derya Kandaz, M. K. Ucar
Objective: Cardiovascular Disease (CVD) is a disease that negatively affects the blood vessel system due to plaque formation as a result of accumulation on the inner wall of the vessels. In the diagnostic phase, angiography results are evaluated by physicians. New diagnostic algorithms based on artificial intelligence, including new technologies, are needed for diagnosing CVD due to the time-consuming and high cost of diagnostic methods. Materials and Methods: The heart disease dataset available on the open-source sharing site Kaggle was used in the study. The dataset includes 14 clinical findings. In the study, after the features were selected with the Fischer feature selection algorithm, they were classified with Ensemble Decision Trees (EDT), k-Nearest Neighborhood Algorithm (kNN), and Neural Networks (NN). A hybrid artificial intelligence algorithm was also created using the three methods. Results: According to the classification results, EDT %96.19, kNN %100, NN %86.17, and hybrid artificial intelligence determined CVD with a %99.3 success rate. Conclusion: According to the obtained results, it is evaluated that the proposed CVD diagnosis hybrid artificial intelligence algorithms can be used in practice
目的:心血管疾病(CVD)是一种由于血管内壁积聚形成斑块而对血管系统产生负面影响的疾病。在诊断阶段,血管造影结果由医生评估。由于CVD诊断方法耗时且成本高,因此需要基于人工智能的新诊断算法,包括新技术。材料和方法:研究中使用了开源共享网站Kaggle上的心脏病数据集。该数据集包括14项临床发现。在研究中,在使用Fischer特征选择算法选择特征后,使用集成决策树(EDT)、k近邻算法(kNN)和神经网络(NN)对特征进行分类。利用这三种方法建立了一种混合人工智能算法。结果:根据分类结果,EDT %96.19, kNN %100, NN %86.17,混合人工智能诊断CVD的成功率为%99.3。结论:根据所得结果,评价所提出的CVD诊断混合人工智能算法可用于实际
{"title":"HYBRID ARTIFICIAL INTELLIGENCE-BASED ALGORITHM DESIGN FOR CARDIOVASCULAR DISEASE DETECTION","authors":"Buse Nur Karaman, Zeynep Bağdatli, Nilay Nisa Taçyildiz, Sude Çi̇ğni̇taş, Derya Kandaz, M. K. Ucar","doi":"10.54569/aair.1141465","DOIUrl":"https://doi.org/10.54569/aair.1141465","url":null,"abstract":"Objective: Cardiovascular Disease (CVD) is a disease that negatively affects the blood vessel system due to plaque formation as a result of accumulation on the inner wall of the vessels. In the diagnostic phase, angiography results are evaluated by physicians. New diagnostic algorithms based on artificial intelligence, including new technologies, are needed for diagnosing CVD due to the time-consuming and high cost of diagnostic methods. \u0000Materials and Methods: The heart disease dataset available on the open-source sharing site Kaggle was used in the study. The dataset includes 14 clinical findings. In the study, after the features were selected with the Fischer feature selection algorithm, they were classified with Ensemble Decision Trees (EDT), k-Nearest Neighborhood Algorithm (kNN), and Neural Networks (NN). A hybrid artificial intelligence algorithm was also created using the three methods. \u0000Results: According to the classification results, EDT %96.19, kNN %100, NN %86.17, and hybrid artificial intelligence determined CVD with a %99.3 success rate. \u0000Conclusion: According to the obtained results, it is evaluated that the proposed CVD diagnosis hybrid artificial intelligence algorithms can be used in practice","PeriodicalId":286492,"journal":{"name":"Advances in Artificial Intelligence Research","volume":"167 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114751109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study approaches the least-squares method for simple linear regression model. The least-squares line does not comply with the data when there are outliers that have deceptive effects on the results in the dataset. The study aims to develop a method for obtaining a line that complies more with the data when there are outliers in the dataset.
{"title":"AN APPROACH TOWARDS THE LEAST-SQUARES METHOD FOR SIMPLE LINEAR REGRESSION","authors":"Hasan Halit Tali̇, Ceren Çelti̇","doi":"10.54569/aair.1032607","DOIUrl":"https://doi.org/10.54569/aair.1032607","url":null,"abstract":"This study approaches the least-squares method for simple linear regression model. The least-squares line does not comply with the data when there are outliers that have deceptive effects on the results in the dataset. The study aims to develop a method for obtaining a line that complies more with the data when there are outliers in the dataset.","PeriodicalId":286492,"journal":{"name":"Advances in Artificial Intelligence Research","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131505900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Artificial Neural Network Model Based on Experimental Measurements for Estimating the Grounding Resistance","authors":"A. Kayabasi, Berat Yıldız, S. Balci","doi":"10.54569/aair.1016850","DOIUrl":"https://doi.org/10.54569/aair.1016850","url":null,"abstract":"","PeriodicalId":286492,"journal":{"name":"Advances in Artificial Intelligence Research","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124044202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Aksoy, Uygar Usta, Gürkan Karadağ, Ali Rıza Kaya, Melek Ömür
{"title":"Classification of environmental sounds with deep learning","authors":"B. Aksoy, Uygar Usta, Gürkan Karadağ, Ali Rıza Kaya, Melek Ömür","doi":"10.54569/aair.1017801","DOIUrl":"https://doi.org/10.54569/aair.1017801","url":null,"abstract":"","PeriodicalId":286492,"journal":{"name":"Advances in Artificial Intelligence Research","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126133364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yasin Görmez, Halil Arslan, Suat Sari, Mücahid Daniş
{"title":"SALDA-ML: Machine Learning Based System Design to Predict Salary In-crease","authors":"Yasin Görmez, Halil Arslan, Suat Sari, Mücahid Daniş","doi":"10.54569/aair.1029836","DOIUrl":"https://doi.org/10.54569/aair.1029836","url":null,"abstract":"","PeriodicalId":286492,"journal":{"name":"Advances in Artificial Intelligence Research","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127722534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Bayrakçi, Abdullah Burak Keşkekçi, Recep Arslan
{"title":"Classification of Iris Flower by Random Forest Algorithm","authors":"H. Bayrakçi, Abdullah Burak Keşkekçi, Recep Arslan","doi":"10.54569/aair.1018444","DOIUrl":"https://doi.org/10.54569/aair.1018444","url":null,"abstract":"","PeriodicalId":286492,"journal":{"name":"Advances in Artificial Intelligence Research","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129136007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}