{"title":"A comparative study of blind source separation methods","authors":"Burak Baysal, Mehmet Önder Efe","doi":"10.55730/1300-0632.4047","DOIUrl":"https://doi.org/10.55730/1300-0632.4047","url":null,"abstract":"","PeriodicalId":49410,"journal":{"name":"Turkish Journal of Electrical Engineering and Computer Sciences","volume":"30 6","pages":""},"PeriodicalIF":1.1,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139205358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: The training of supervised machine learning approaches is critically dependent on annotating large-scale datasets. Semisupervised learning approaches aim to achieve compatible performance with supervised methods using relatively less annotation without sacrificing good generalization capacity. In line with this objective, ways of leveraging unlabeled data have been the subject of intense research. However, semisupervised video action recognition has received relatively less attention compared to image domain implementations. Existing semisupervised video action recognition methods trained from scratch rely heavily on augmentation techniques, complex architectures, and/or the use of other modalities while distillation-based methods use models that have only been trained for 2D computer vision tasks. In another line of work, pretrained vision-language models have shown very promising results for generating general-purpose visual features with reports of high zero-shot performance for many downstream tasks. In this work, we exploit a language-supervised visual encoder for learning video representations for video action classification tasks. We propose a teacher-student learning paradigm through feature distillation and pseudo-labeling. Our experimental results are a proof-of-concept revealing that multimodal feature extractors can be utilized for spatiotemporal feature extraction in a semisupervised learning context and show compatible performance with SOTA methods, especially in a low-label regime.
{"title":"Feature distillation from vision-language model for semisupervised action classification","authors":"ASLI ÇELİK, AYHAN KÜÇÜKMANİSA, OĞUZHAN URHAN","doi":"10.55730/1300-0632.4038","DOIUrl":"https://doi.org/10.55730/1300-0632.4038","url":null,"abstract":": The training of supervised machine learning approaches is critically dependent on annotating large-scale datasets. Semisupervised learning approaches aim to achieve compatible performance with supervised methods using relatively less annotation without sacrificing good generalization capacity. In line with this objective, ways of leveraging unlabeled data have been the subject of intense research. However, semisupervised video action recognition has received relatively less attention compared to image domain implementations. Existing semisupervised video action recognition methods trained from scratch rely heavily on augmentation techniques, complex architectures, and/or the use of other modalities while distillation-based methods use models that have only been trained for 2D computer vision tasks. In another line of work, pretrained vision-language models have shown very promising results for generating general-purpose visual features with reports of high zero-shot performance for many downstream tasks. In this work, we exploit a language-supervised visual encoder for learning video representations for video action classification tasks. We propose a teacher-student learning paradigm through feature distillation and pseudo-labeling. Our experimental results are a proof-of-concept revealing that multimodal feature extractors can be utilized for spatiotemporal feature extraction in a semisupervised learning context and show compatible performance with SOTA methods, especially in a low-label regime.","PeriodicalId":49410,"journal":{"name":"Turkish Journal of Electrical Engineering and Computer Sciences","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135302579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TRCaptionNet: A novel and accurate deep Turkish image captioning model with vision transformer based image encoders and deep linguistic text decoders","authors":"SERDAR YILDIZ, ABBAS MEMİŞ, SONGÜL VARLI","doi":"10.55730/1300-0632.4035","DOIUrl":"https://doi.org/10.55730/1300-0632.4035","url":null,"abstract":"","PeriodicalId":49410,"journal":{"name":"Turkish Journal of Electrical Engineering and Computer Sciences","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135302578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
XINFEI LIAO, DAN WANG, ZAIRAN LI, NILANJAN DEY, RS SIMON, FUQIAN SHI
Explainable AI (XAI) improved by a deep neural network (DNN) of a residual neural network (ResNet) and long short-term memory networks (LSTMs), termed XAIRL, is proposed for segmenting foot infrared imaging datasets. First, an infrared sensor imaging dataset is acquired by a foot infrared sensor imaging device and preprocessed. The infrared sensor image features are then defined and extracted with XAIRL being applied to segment the dataset. This paper compares and discusses our results with XAIRL. Evaluation indices are applied to perform various measurements for foot infrared image segmentation including accuracy, precision, recall, F1 score, intersection over union (IoU), Dice similarity coefficient, mean intersection of union, boundary displacement error (BDE), Hausdorff distance, and receiver operating characteristic (ROC). Compared to results from the literature, XAIRL shows the highest overall performance, achieving accuracy of 0.93, precision of 0.91, recall of 0.95, and F1 score of 0.93. XAIRL also displays the highest IoU, Dice similarity coefficient, and ROC curve and the lowest BDE and Hausdorff distance. Although U-Net performs well for most metrics, Mask R-CNN shows slightly worse performance but still outperforms the random forest and support vector machine algorithms. By building a high-quality foot infrared imaging dataset, machine learning-based algorithms can accurately analyze foot temperature and pressure distribution. These models can then be used to customize shoes for individual wearers, improving their comfort and reducing the risk of foot injuries, particularly for those with high blood pressure.
{"title":"Infrared imaging segmentation employing an explainable deep neural network","authors":"XINFEI LIAO, DAN WANG, ZAIRAN LI, NILANJAN DEY, RS SIMON, FUQIAN SHI","doi":"10.55730/1300-0632.4032","DOIUrl":"https://doi.org/10.55730/1300-0632.4032","url":null,"abstract":"Explainable AI (XAI) improved by a deep neural network (DNN) of a residual neural network (ResNet) and long short-term memory networks (LSTMs), termed XAIRL, is proposed for segmenting foot infrared imaging datasets. First, an infrared sensor imaging dataset is acquired by a foot infrared sensor imaging device and preprocessed. The infrared sensor image features are then defined and extracted with XAIRL being applied to segment the dataset. This paper compares and discusses our results with XAIRL. Evaluation indices are applied to perform various measurements for foot infrared image segmentation including accuracy, precision, recall, F1 score, intersection over union (IoU), Dice similarity coefficient, mean intersection of union, boundary displacement error (BDE), Hausdorff distance, and receiver operating characteristic (ROC). Compared to results from the literature, XAIRL shows the highest overall performance, achieving accuracy of 0.93, precision of 0.91, recall of 0.95, and F1 score of 0.93. XAIRL also displays the highest IoU, Dice similarity coefficient, and ROC curve and the lowest BDE and Hausdorff distance. Although U-Net performs well for most metrics, Mask R-CNN shows slightly worse performance but still outperforms the random forest and support vector machine algorithms. By building a high-quality foot infrared imaging dataset, machine learning-based algorithms can accurately analyze foot temperature and pressure distribution. These models can then be used to customize shoes for individual wearers, improving their comfort and reducing the risk of foot injuries, particularly for those with high blood pressure.","PeriodicalId":49410,"journal":{"name":"Turkish Journal of Electrical Engineering and Computer Sciences","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135302711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
: Deep convolutional neural networks can fully use the intrinsic relationship between features and improve the separability of hyperspectral images, which has received extensive in recent years. However, the need for a large number of labelled samples to train deep network models limits the application of such methods. The idea of transfer learning is introduced into remote sensing image classification to reduce the need for the number of labelled samples. In particular, the situation in which each class in the target picture only has one labelled sample is investigated. In the target domain, the number of training samples is enlarged by the homogenous region obtained by segmenting the target image. On this basis, the deep Siamese convolutional neural network is used to reduce the distribution difference between the source domain image and the target domain image to achieve the final result of the target hyperspectral image classification. The experimental results show that the combination of homogenous region and Siamese convolutional network can improve the classification effect of semisupervised transfer learning and better solve cross-regional hyperspectral image classification.
{"title":"Cognitive digital modelling for hyperspectral image classification using transfer learning model","authors":"MOHAMMAD SHABAZ, MUKESH SONI","doi":"10.55730/1300-0632.4033","DOIUrl":"https://doi.org/10.55730/1300-0632.4033","url":null,"abstract":": Deep convolutional neural networks can fully use the intrinsic relationship between features and improve the separability of hyperspectral images, which has received extensive in recent years. However, the need for a large number of labelled samples to train deep network models limits the application of such methods. The idea of transfer learning is introduced into remote sensing image classification to reduce the need for the number of labelled samples. In particular, the situation in which each class in the target picture only has one labelled sample is investigated. In the target domain, the number of training samples is enlarged by the homogenous region obtained by segmenting the target image. On this basis, the deep Siamese convolutional neural network is used to reduce the distribution difference between the source domain image and the target domain image to achieve the final result of the target hyperspectral image classification. The experimental results show that the combination of homogenous region and Siamese convolutional network can improve the classification effect of semisupervised transfer learning and better solve cross-regional hyperspectral image classification.","PeriodicalId":49410,"journal":{"name":"Turkish Journal of Electrical Engineering and Computer Sciences","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135302570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AMJAD REHMAN, TANZILA SABA, HAIDER ALI, NARMINE ELHAKIM, NOOR AYESHA
Chronic kidney diseases proliferate due to hypertension, diabetes, anemia, obesity, smoking etc. Patients with such conditions are sometimes unaware of first symptoms, complicating disease diagnosis. This paper presents chronic kidney disease (CKD) prediction model to classify CKD patients from NCKD (Non-CKD). The proposed study has two main stages. First, we found the odds ratio through logistic regression and comparison test to identify early risk factors from kidneys? MRI and differentiate CKD from NCKD subjects. In stage 2, LR, LDA, MLP classifiers were applied to predict CKD and NCKD by extracting features from MRI. The odds ratio of blood glucose random and serum creatinine was found higher, and levels of sodium, Potassium, packed cell volume, white blood cell count, and red blood cell count were found lesser in CKD. The comparison results show increase levels in blood glucose random, serum creatinine and decreased levels found in sodium, potassium, packed cell volume, White blood cell and red blood cell count respectively in CKD patients than NCKD subjects. The accuracies of LR were 98.5% and 97.5% for train & test datasets. While LDA accuracy was 96.07% and 96.6% for train and test datasets. Likewise, MLP attained were 95% and 94.1% accuracy for train and test datasets. Finally, we used 5-fold CV approach on the LR model. The mean accuracies of LR were 0.954 and 0.942 for training and testing data respectively. According to LR the serum creatinine, Albumin, Diabetes mellitus, red blood cells count, pus cell and hypertension were found to be the most significant features to discriminate the CKD patients from NCKD. The proposed strategy is best suited for practical implementation for reducing the disease's prevalence.
{"title":"Hybrid machine learning model to predict chronic kidney diseases using handcrafted features for early health rehabilitation","authors":"AMJAD REHMAN, TANZILA SABA, HAIDER ALI, NARMINE ELHAKIM, NOOR AYESHA","doi":"10.55730/1300-0632.4028","DOIUrl":"https://doi.org/10.55730/1300-0632.4028","url":null,"abstract":"Chronic kidney diseases proliferate due to hypertension, diabetes, anemia, obesity, smoking etc. Patients with such conditions are sometimes unaware of first symptoms, complicating disease diagnosis. This paper presents chronic kidney disease (CKD) prediction model to classify CKD patients from NCKD (Non-CKD). The proposed study has two main stages. First, we found the odds ratio through logistic regression and comparison test to identify early risk factors from kidneys? MRI and differentiate CKD from NCKD subjects. In stage 2, LR, LDA, MLP classifiers were applied to predict CKD and NCKD by extracting features from MRI. The odds ratio of blood glucose random and serum creatinine was found higher, and levels of sodium, Potassium, packed cell volume, white blood cell count, and red blood cell count were found lesser in CKD. The comparison results show increase levels in blood glucose random, serum creatinine and decreased levels found in sodium, potassium, packed cell volume, White blood cell and red blood cell count respectively in CKD patients than NCKD subjects. The accuracies of LR were 98.5% and 97.5% for train & test datasets. While LDA accuracy was 96.07% and 96.6% for train and test datasets. Likewise, MLP attained were 95% and 94.1% accuracy for train and test datasets. Finally, we used 5-fold CV approach on the LR model. The mean accuracies of LR were 0.954 and 0.942 for training and testing data respectively. According to LR the serum creatinine, Albumin, Diabetes mellitus, red blood cells count, pus cell and hypertension were found to be the most significant features to discriminate the CKD patients from NCKD. The proposed strategy is best suited for practical implementation for reducing the disease's prevalence.","PeriodicalId":49410,"journal":{"name":"Turkish Journal of Electrical Engineering and Computer Sciences","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135302713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MOHAMMED ABDULMAJEED MOHARRAM, DIVYA MEENA SUNDARAM
: Hyperspectral imaging has emerged as a prominent area of research in the field of remote sensing science. However, hyperspectral images (HSIs) pose a notable challenge due to the presence of numerous irrelevant and redundant spectral bands exhibiting high correlation. Therefore, it is necessary to enhance the classification performance for HSI processing by selecting the most relevant discriminative spectral bands. To this end, this paper introduces a metaheuristic search method called enhancing exploration-exploitation in harmony search (E3HS). The standard harmony search suffers from many weaknesses, such as premature convergence and falling easily into the local optimum. Consequently, E3HS was proposed to evade falling into the local optimum by creating a balance between exploration and exploitation strategies to accelerate convergence toward the global optimum solution. Finally, two machine learning classifiers (k-nearest neighbor and support vector machine) were employed for hyperspectral image classification at the pixel level. Moreover, the proposed method was compared with the bat algorithm, Archimedes optimization algorithm, particle swarm optimization, standard harmony search, genetic algorithm, and krill herd algorithm. The experimental results demonstrated significant improvement with overall accuracy equal to 87.49%, 94.85%, and 94.41% for the Indian Pines, Pavia University, and Salinas datasets, respectively.
{"title":"Enhancing exploration-exploitation in harmony search for airborne hyperspectral imaging band selection (E3HS)","authors":"MOHAMMED ABDULMAJEED MOHARRAM, DIVYA MEENA SUNDARAM","doi":"10.55730/1300-0632.4029","DOIUrl":"https://doi.org/10.55730/1300-0632.4029","url":null,"abstract":": Hyperspectral imaging has emerged as a prominent area of research in the field of remote sensing science. However, hyperspectral images (HSIs) pose a notable challenge due to the presence of numerous irrelevant and redundant spectral bands exhibiting high correlation. Therefore, it is necessary to enhance the classification performance for HSI processing by selecting the most relevant discriminative spectral bands. To this end, this paper introduces a metaheuristic search method called enhancing exploration-exploitation in harmony search (E3HS). The standard harmony search suffers from many weaknesses, such as premature convergence and falling easily into the local optimum. Consequently, E3HS was proposed to evade falling into the local optimum by creating a balance between exploration and exploitation strategies to accelerate convergence toward the global optimum solution. Finally, two machine learning classifiers (k-nearest neighbor and support vector machine) were employed for hyperspectral image classification at the pixel level. Moreover, the proposed method was compared with the bat algorithm, Archimedes optimization algorithm, particle swarm optimization, standard harmony search, genetic algorithm, and krill herd algorithm. The experimental results demonstrated significant improvement with overall accuracy equal to 87.49%, 94.85%, and 94.41% for the Indian Pines, Pavia University, and Salinas datasets, respectively.","PeriodicalId":49410,"journal":{"name":"Turkish Journal of Electrical Engineering and Computer Sciences","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135302572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The duty of monitoring traffic during rush hour is difficult due to the fact that modern roadways are getting more crowded every day. The automated solutions that have already been created in this area are ineffective at processing enormous amounts of data in a short amount of time, leading to ineffectiveness and inconsistent results. The YOLO (you only look once) and LSH (locality sensitive hashing) algorithms are combined with the Kafka architecture in this study to create a method for assessing traffic density in real-time scenarios. Our concept, which is specifically designed for vehicular networks, predicts the traffic density in a given location by gathering live stream data from traffic surveillance cameras and transforming it into frames (at a rate of 11 per minute) using the YOLOv3 algorithm, which is a crucial parameter for performing effective traffic diversion by suggesting alternate routes and avoiding traffic congestion. The predicted density is then projected onto Google Maps for the convenience of local clients. The comparative study?s results demonstrate that our strategy consistently and accurately predicts vehicular density, with an accuracy of more than 90 percent under all conditions. It also shows a significant improvement in both precision and recall, with a 4.08 percent improvement.
{"title":"YOLO and LSH-based video stream analytics landscape for short-term traffic density surveillance at road networks","authors":"LAVANYA K, STUTI TIWARI, RAHUL ANAND, JUDE HEMANTH","doi":"10.55730/1300-0632.4036","DOIUrl":"https://doi.org/10.55730/1300-0632.4036","url":null,"abstract":"The duty of monitoring traffic during rush hour is difficult due to the fact that modern roadways are getting more crowded every day. The automated solutions that have already been created in this area are ineffective at processing enormous amounts of data in a short amount of time, leading to ineffectiveness and inconsistent results. The YOLO (you only look once) and LSH (locality sensitive hashing) algorithms are combined with the Kafka architecture in this study to create a method for assessing traffic density in real-time scenarios. Our concept, which is specifically designed for vehicular networks, predicts the traffic density in a given location by gathering live stream data from traffic surveillance cameras and transforming it into frames (at a rate of 11 per minute) using the YOLOv3 algorithm, which is a crucial parameter for performing effective traffic diversion by suggesting alternate routes and avoiding traffic congestion. The predicted density is then projected onto Google Maps for the convenience of local clients. The comparative study?s results demonstrate that our strategy consistently and accurately predicts vehicular density, with an accuracy of more than 90 percent under all conditions. It also shows a significant improvement in both precision and recall, with a 4.08 percent improvement.","PeriodicalId":49410,"journal":{"name":"Turkish Journal of Electrical Engineering and Computer Sciences","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135302714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}