Pub Date : 2023-10-07DOI: 10.3390/computers12100200
Abdulaziz AlMohimeed, Hager Saleh, Sherif Mostafa, Redhwan M. A. Saad, Amira Samy Talaat
Cervical cancer affects more than half a million women worldwide each year and causes over 300,000 deaths. The main goals of this paper are to study the effect of applying feature selection methods with stacking models for the prediction of cervical cancer, propose stacking ensemble learning that combines different models with meta-learners to predict cervical cancer, and explore the black-box of the stacking model with the best-optimized features using explainable artificial intelligence (XAI). A cervical cancer dataset from the machine learning repository (UCI) that is highly imbalanced and contains missing values is used. Therefore, SMOTE-Tomek was used to combine under-sampling and over-sampling to handle imbalanced data, and pre-processing steps are implemented to hold missing values. Bayesian optimization optimizes models and selects the best model architecture. Chi-square scores, recursive feature removal, and tree-based feature selection are three feature selection techniques that are applied to the dataset For determining the factors that are most crucial for predicting cervical cancer, the stacking model is extended to multiple levels: Level 1 (multiple base learners) and Level 2 (meta-learner). At Level 1, stacking (training and testing stacking) is employed for combining the output of multi-base models, while training stacking is used to train meta-learner models at level 2. Testing stacking is used to evaluate meta-learner models. The results showed that based on the selected features from recursive feature elimination (RFE), the stacking model has higher accuracy, precision, recall, f1-score, and AUC. Furthermore, To assure the efficiency, efficacy, and reliability of the produced model, local and global explanations are provided.
{"title":"Cervical Cancer Diagnosis Using Stacked Ensemble Model and Optimized Feature Selection: An Explainable Artificial Intelligence Approach","authors":"Abdulaziz AlMohimeed, Hager Saleh, Sherif Mostafa, Redhwan M. A. Saad, Amira Samy Talaat","doi":"10.3390/computers12100200","DOIUrl":"https://doi.org/10.3390/computers12100200","url":null,"abstract":"Cervical cancer affects more than half a million women worldwide each year and causes over 300,000 deaths. The main goals of this paper are to study the effect of applying feature selection methods with stacking models for the prediction of cervical cancer, propose stacking ensemble learning that combines different models with meta-learners to predict cervical cancer, and explore the black-box of the stacking model with the best-optimized features using explainable artificial intelligence (XAI). A cervical cancer dataset from the machine learning repository (UCI) that is highly imbalanced and contains missing values is used. Therefore, SMOTE-Tomek was used to combine under-sampling and over-sampling to handle imbalanced data, and pre-processing steps are implemented to hold missing values. Bayesian optimization optimizes models and selects the best model architecture. Chi-square scores, recursive feature removal, and tree-based feature selection are three feature selection techniques that are applied to the dataset For determining the factors that are most crucial for predicting cervical cancer, the stacking model is extended to multiple levels: Level 1 (multiple base learners) and Level 2 (meta-learner). At Level 1, stacking (training and testing stacking) is employed for combining the output of multi-base models, while training stacking is used to train meta-learner models at level 2. Testing stacking is used to evaluate meta-learner models. The results showed that based on the selected features from recursive feature elimination (RFE), the stacking model has higher accuracy, precision, recall, f1-score, and AUC. Furthermore, To assure the efficiency, efficacy, and reliability of the produced model, local and global explanations are provided.","PeriodicalId":46292,"journal":{"name":"Computers","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135251942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-04DOI: 10.3390/computers12100199
William Villegas-Ch, Joselin García-Ortiz
In the digital age, the personalization of learning has become a critical priority in education. This article delves into the cutting-edge of educational innovation by exploring the essential role of ontology-based knowledge representation in transforming the educational experience. This research stands out for its significant and distinctive contribution to improving the personalization of learning. For this, concrete examples of use cases are presented in various academic fields, from formal education to corporate training and online learning. It is identified how ontologies capture and organize knowledge semantically, allowing the intelligent adaptation of content, the inference of activity and resource recommendations, and the creation of highly personalized learning paths. In this context, the novelty lies in the innovative approach to designing educational ontologies, which exhaustively considers different use cases and academic scenarios. Additionally, we delve deeper into the design decisions that support the effectiveness and usefulness of these ontologies for effective learning personalization. Through practical examples, it is illustrated how the implementation of ontologies transforms education, offering richer educational experiences adapted to students’ individual needs. This research represents a valuable contribution to personalized education and knowledge management in contemporary educational environments. The novelty of this work lies in its ability to redefine and improve the personalization of learning in a constantly evolving digital world.
{"title":"Enhancing Learning Personalization in Educational Environments through Ontology-Based Knowledge Representation","authors":"William Villegas-Ch, Joselin García-Ortiz","doi":"10.3390/computers12100199","DOIUrl":"https://doi.org/10.3390/computers12100199","url":null,"abstract":"In the digital age, the personalization of learning has become a critical priority in education. This article delves into the cutting-edge of educational innovation by exploring the essential role of ontology-based knowledge representation in transforming the educational experience. This research stands out for its significant and distinctive contribution to improving the personalization of learning. For this, concrete examples of use cases are presented in various academic fields, from formal education to corporate training and online learning. It is identified how ontologies capture and organize knowledge semantically, allowing the intelligent adaptation of content, the inference of activity and resource recommendations, and the creation of highly personalized learning paths. In this context, the novelty lies in the innovative approach to designing educational ontologies, which exhaustively considers different use cases and academic scenarios. Additionally, we delve deeper into the design decisions that support the effectiveness and usefulness of these ontologies for effective learning personalization. Through practical examples, it is illustrated how the implementation of ontologies transforms education, offering richer educational experiences adapted to students’ individual needs. This research represents a valuable contribution to personalized education and knowledge management in contemporary educational environments. The novelty of this work lies in its ability to redefine and improve the personalization of learning in a constantly evolving digital world.","PeriodicalId":46292,"journal":{"name":"Computers","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135592445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-02DOI: 10.3390/computers12100198
Praveen Kumar Donta, Ilir Murturi, Victor Casamayor Pujol, Boris Sedlak, Schahram Dustdar
Computing paradigms have evolved significantly in recent decades, moving from large room-sized resources (processors and memory) to incredibly small computing nodes. Recently, the power of computing has attracted almost all current application fields. Currently, distributed computing continuum systems (DCCSs) are unleashing the era of a computing paradigm that unifies various computing resources, including cloud, fog/edge computing, the Internet of Things (IoT), and mobile devices into a seamless and integrated continuum. Its seamless infrastructure efficiently manages diverse processing loads and ensures a consistent user experience. Furthermore, it provides a holistic solution to meet modern computing needs. In this context, this paper presents a deeper understanding of DCCSs’ potential in today’s computing environment. First, we discuss the evolution of computing paradigms up to DCCS. The general architectures, components, and various computing devices are discussed, and the benefits and limitations of each computing paradigm are analyzed. After that, our discussion continues into various computing devices that constitute part of DCCS to achieve computational goals in current and futuristic applications. In addition, we delve into the key features and benefits of DCCS from the perspective of current computing needs. Furthermore, we provide a comprehensive overview of emerging applications (with a case study analysis) that desperately need DCCS architectures to perform their tasks. Finally, we describe the open challenges and possible developments that need to be made to DCCS to unleash its widespread potential for the majority of applications.
{"title":"Exploring the Potential of Distributed Computing Continuum Systems","authors":"Praveen Kumar Donta, Ilir Murturi, Victor Casamayor Pujol, Boris Sedlak, Schahram Dustdar","doi":"10.3390/computers12100198","DOIUrl":"https://doi.org/10.3390/computers12100198","url":null,"abstract":"Computing paradigms have evolved significantly in recent decades, moving from large room-sized resources (processors and memory) to incredibly small computing nodes. Recently, the power of computing has attracted almost all current application fields. Currently, distributed computing continuum systems (DCCSs) are unleashing the era of a computing paradigm that unifies various computing resources, including cloud, fog/edge computing, the Internet of Things (IoT), and mobile devices into a seamless and integrated continuum. Its seamless infrastructure efficiently manages diverse processing loads and ensures a consistent user experience. Furthermore, it provides a holistic solution to meet modern computing needs. In this context, this paper presents a deeper understanding of DCCSs’ potential in today’s computing environment. First, we discuss the evolution of computing paradigms up to DCCS. The general architectures, components, and various computing devices are discussed, and the benefits and limitations of each computing paradigm are analyzed. After that, our discussion continues into various computing devices that constitute part of DCCS to achieve computational goals in current and futuristic applications. In addition, we delve into the key features and benefits of DCCS from the perspective of current computing needs. Furthermore, we provide a comprehensive overview of emerging applications (with a case study analysis) that desperately need DCCS architectures to perform their tasks. Finally, we describe the open challenges and possible developments that need to be made to DCCS to unleash its widespread potential for the majority of applications.","PeriodicalId":46292,"journal":{"name":"Computers","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135898017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Epilepsy is a neurological disease characterized by recurrent seizures caused by abnormal electrical activity in the brain. One of the methods used to diagnose epilepsy is through electroencephalogram (EEG) analysis. EEG is a non-invasive medical test for quantifying electrical activity in the brain. Applying machine learning (ML) to EEG data for epilepsy diagnosis has the potential to be more accurate and efficient. However, expert knowledge is required to set up the ML model with correct hyperparameters. Automated machine learning (AutoML) tools aim to make ML more accessible to non-experts and automate many ML processes to create a high-performing ML model. This article explores the use of automated machine learning (AutoML) tools for diagnosing epilepsy using electroencephalogram (EEG) data. The study compares the performance of three different AutoML tools, AutoGluon, Auto-Sklearn, and Amazon Sagemaker, on three different datasets from the UC Irvine ML Repository, Bonn EEG time series dataset, and Zenodo. Performance measures used for evaluation include accuracy, F1 score, recall, and precision. The results show that all three AutoML tools were able to generate high-performing ML models for the diagnosis of epilepsy. The generated ML models perform better when the training dataset is larger in size. Amazon Sagemaker and Auto-Sklearn performed better with smaller datasets. This is the first study to compare several AutoML tools and shows that AutoML tools can be utilized to create well-performing solutions for the diagnosis of epilepsy via processing hard-to-analyze EEG timeseries data.
癫痫是一种神经系统疾病,其特征是由大脑异常电活动引起的反复发作。诊断癫痫的方法之一是通过脑电图(EEG)分析。脑电图是一种量化脑电活动的非侵入性医学测试。将机器学习(ML)应用于脑电图数据进行癫痫诊断有可能更加准确和高效。然而,建立具有正确超参数的机器学习模型需要专家知识。自动化机器学习(AutoML)工具旨在使非专家更容易访问机器学习,并自动化许多机器学习过程以创建高性能机器学习模型。本文探讨了使用自动机器学习(AutoML)工具使用脑电图(EEG)数据诊断癫痫。该研究比较了三种不同的AutoML工具(AutoGluon、Auto-Sklearn和Amazon Sagemaker)在来自UC Irvine ML Repository、Bonn EEG时间序列数据集和Zenodo的三种不同数据集上的性能。用于评估的性能度量包括准确性、F1分数、召回率和精度。结果表明,这三种AutoML工具都能够生成用于癫痫诊断的高性能ML模型。当训练数据集更大时,生成的ML模型表现更好。Amazon Sagemaker和Auto-Sklearn在较小的数据集上表现更好。这是第一个比较几种AutoML工具的研究,并表明AutoML工具可以通过处理难以分析的EEG时间序列数据来创建性能良好的癫痫诊断解决方案。
{"title":"Comparison of Automated Machine Learning (AutoML) Tools for Epileptic Seizure Detection Using Electroencephalograms (EEG)","authors":"Swetha Lenkala, Revathi Marry, Susmitha Reddy Gopovaram, Tahir Cetin Akinci, Oguzhan Topsakal","doi":"10.3390/computers12100197","DOIUrl":"https://doi.org/10.3390/computers12100197","url":null,"abstract":"Epilepsy is a neurological disease characterized by recurrent seizures caused by abnormal electrical activity in the brain. One of the methods used to diagnose epilepsy is through electroencephalogram (EEG) analysis. EEG is a non-invasive medical test for quantifying electrical activity in the brain. Applying machine learning (ML) to EEG data for epilepsy diagnosis has the potential to be more accurate and efficient. However, expert knowledge is required to set up the ML model with correct hyperparameters. Automated machine learning (AutoML) tools aim to make ML more accessible to non-experts and automate many ML processes to create a high-performing ML model. This article explores the use of automated machine learning (AutoML) tools for diagnosing epilepsy using electroencephalogram (EEG) data. The study compares the performance of three different AutoML tools, AutoGluon, Auto-Sklearn, and Amazon Sagemaker, on three different datasets from the UC Irvine ML Repository, Bonn EEG time series dataset, and Zenodo. Performance measures used for evaluation include accuracy, F1 score, recall, and precision. The results show that all three AutoML tools were able to generate high-performing ML models for the diagnosis of epilepsy. The generated ML models perform better when the training dataset is larger in size. Amazon Sagemaker and Auto-Sklearn performed better with smaller datasets. This is the first study to compare several AutoML tools and shows that AutoML tools can be utilized to create well-performing solutions for the diagnosis of epilepsy via processing hard-to-analyze EEG timeseries data.","PeriodicalId":46292,"journal":{"name":"Computers","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135199024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-28DOI: 10.3390/computers12100195
Abishek Manikandaraja, Peter Aaby, Nikolaos Pitropakis
Artificial intelligence and machine learning have become a necessary part of modern living along with the increased adoption of new computational devices. Because machine learning and artificial intelligence can detect malware better than traditional signature detection, the development of new and novel malware aiming to bypass detection has caused a challenge where models may experience concept drift. However, as new malware samples appear, the detection performance drops. Our work aims to discuss the performance degradation of machine learning-based malware detectors with time, also called concept drift. To achieve this goal, we develop a Python-based framework, namely Rapidrift, capable of analysing the concept drift at a more granular level. We also created two new malware datasets, TRITIUM and INFRENO, from different sources and threat profiles to conduct a deeper analysis of the concept drift problem. To test the effectiveness of Rapidrift, various fundamental methods that could reduce the effects of concept drift were experimentally explored.
{"title":"Rapidrift: Elementary Techniques to Improve Machine Learning-Based Malware Detection","authors":"Abishek Manikandaraja, Peter Aaby, Nikolaos Pitropakis","doi":"10.3390/computers12100195","DOIUrl":"https://doi.org/10.3390/computers12100195","url":null,"abstract":"Artificial intelligence and machine learning have become a necessary part of modern living along with the increased adoption of new computational devices. Because machine learning and artificial intelligence can detect malware better than traditional signature detection, the development of new and novel malware aiming to bypass detection has caused a challenge where models may experience concept drift. However, as new malware samples appear, the detection performance drops. Our work aims to discuss the performance degradation of machine learning-based malware detectors with time, also called concept drift. To achieve this goal, we develop a Python-based framework, namely Rapidrift, capable of analysing the concept drift at a more granular level. We also created two new malware datasets, TRITIUM and INFRENO, from different sources and threat profiles to conduct a deeper analysis of the concept drift problem. To test the effectiveness of Rapidrift, various fundamental methods that could reduce the effects of concept drift were experimentally explored.","PeriodicalId":46292,"journal":{"name":"Computers","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135425641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-28DOI: 10.3390/computers12100196
Mohammad Tubishat, Feras Al-Obeidat, Ali Safaa Sadiq, Seyedali Mirjalili
Spam emails have become a pervasive issue in recent years, as internet users receive increasing amounts of unwanted or fake emails. To combat this issue, automatic spam detection methods have been proposed, which aim to classify emails into spam and non-spam categories. Machine learning techniques have been utilized for this task with considerable success. In this paper, we introduce a novel approach to spam email detection by presenting significant advancements to the Dandelion Optimizer (DO) algorithm. The DO is a relatively new nature-inspired optimization algorithm inspired by the flight of dandelion seeds. While the DO shows promise, it faces challenges, especially in high-dimensional problems such as feature selection for spam detection. Our primary contributions focus on enhancing the DO algorithm. Firstly, we introduce a new local search algorithm based on flipping (LSAF), designed to improve the DO’s ability to find the best solutions. Secondly, we propose a reduction equation that streamlines the population size during algorithm execution, reducing computational complexity. To showcase the effectiveness of our modified DO algorithm, which we refer to as the Improved DO (IDO), we conduct a comprehensive evaluation using the Spam base dataset from the UCI repository. However, we emphasize that our primary objective is to advance the DO algorithm, with spam email detection serving as a case study application. Comparative analysis against several popular algorithms, including Particle Swarm Optimization (PSO), the Genetic Algorithm (GA), Generalized Normal Distribution Optimization (GNDO), the Chimp Optimization Algorithm (ChOA), the Grasshopper Optimization Algorithm (GOA), Ant Lion Optimizer (ALO), and the Dragonfly Algorithm (DA), demonstrates the superior performance of our proposed IDO algorithm. It excels in accuracy, fitness, and the number of selected features, among other metrics. Our results clearly indicate that the IDO overcomes the local optima problem commonly associated with the standard DO algorithm, owing to the incorporation of LSAF and the reduction in equation methods. In summary, our paper underscores the significant advancement made in the form of the IDO algorithm, which represents a promising approach for solving high-dimensional optimization problems, with a keen focus on practical applications in real-world systems. While we employ spam email detection as a case study, our primary contribution lies in the improved DO algorithm, which is efficient, accurate, and outperforms several state-of-the-art algorithms in various metrics. This work opens avenues for enhancing optimization techniques and their applications in machine learning.
{"title":"An Improved Dandelion Optimizer Algorithm for Spam Detection: Next-Generation Email Filtering System","authors":"Mohammad Tubishat, Feras Al-Obeidat, Ali Safaa Sadiq, Seyedali Mirjalili","doi":"10.3390/computers12100196","DOIUrl":"https://doi.org/10.3390/computers12100196","url":null,"abstract":"Spam emails have become a pervasive issue in recent years, as internet users receive increasing amounts of unwanted or fake emails. To combat this issue, automatic spam detection methods have been proposed, which aim to classify emails into spam and non-spam categories. Machine learning techniques have been utilized for this task with considerable success. In this paper, we introduce a novel approach to spam email detection by presenting significant advancements to the Dandelion Optimizer (DO) algorithm. The DO is a relatively new nature-inspired optimization algorithm inspired by the flight of dandelion seeds. While the DO shows promise, it faces challenges, especially in high-dimensional problems such as feature selection for spam detection. Our primary contributions focus on enhancing the DO algorithm. Firstly, we introduce a new local search algorithm based on flipping (LSAF), designed to improve the DO’s ability to find the best solutions. Secondly, we propose a reduction equation that streamlines the population size during algorithm execution, reducing computational complexity. To showcase the effectiveness of our modified DO algorithm, which we refer to as the Improved DO (IDO), we conduct a comprehensive evaluation using the Spam base dataset from the UCI repository. However, we emphasize that our primary objective is to advance the DO algorithm, with spam email detection serving as a case study application. Comparative analysis against several popular algorithms, including Particle Swarm Optimization (PSO), the Genetic Algorithm (GA), Generalized Normal Distribution Optimization (GNDO), the Chimp Optimization Algorithm (ChOA), the Grasshopper Optimization Algorithm (GOA), Ant Lion Optimizer (ALO), and the Dragonfly Algorithm (DA), demonstrates the superior performance of our proposed IDO algorithm. It excels in accuracy, fitness, and the number of selected features, among other metrics. Our results clearly indicate that the IDO overcomes the local optima problem commonly associated with the standard DO algorithm, owing to the incorporation of LSAF and the reduction in equation methods. In summary, our paper underscores the significant advancement made in the form of the IDO algorithm, which represents a promising approach for solving high-dimensional optimization problems, with a keen focus on practical applications in real-world systems. While we employ spam email detection as a case study, our primary contribution lies in the improved DO algorithm, which is efficient, accurate, and outperforms several state-of-the-art algorithms in various metrics. This work opens avenues for enhancing optimization techniques and their applications in machine learning.","PeriodicalId":46292,"journal":{"name":"Computers","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135425073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-27DOI: 10.3390/computers12100194
Georgios Psathas, Theano K. Chatzidaki, Stavros N. Demetriadis
The primary objective of this study is to examine the factors that contribute to the early prediction of Massive Open Online Courses (MOOCs) dropouts in order to identify and support at-risk students. We utilize MOOC data of specific duration, with a guided study pace. The dataset exhibits class imbalance, and we apply oversampling techniques to ensure data balancing and unbiased prediction. We examine the predictive performance of five classic classification machine learning (ML) algorithms under four different oversampling techniques and various evaluation metrics. Additionally, we explore the influence of self-reported self-regulated learning (SRL) data provided by students and various other prominent features of MOOCs as potential indicators of early stage dropout prediction. The research questions focus on (1) the performance of the classic classification ML models using various evaluation metrics before and after different methods of oversampling, (2) which self-reported data may constitute crucial predictors for dropout propensity, and (3) the effect of the SRL factor on the dropout prediction performance. The main conclusions are: (1) prominent predictors, including employment status, frequency of chat tool usage, prior subject-related experiences, gender, education, and willingness to participate, exhibit remarkable efficacy in achieving high to excellent recall performance, particularly when specific combinations of algorithms and oversampling methods are applied, (2) self-reported SRL factor, combined with easily provided/self-reported features, performed well as a predictor in terms of recall when LR and SVM algorithms were employed, (3) it is crucial to test diverse machine learning algorithms and oversampling methods in predictive modeling.
{"title":"Predictive Modeling of Student Dropout in MOOCs and Self-Regulated Learning","authors":"Georgios Psathas, Theano K. Chatzidaki, Stavros N. Demetriadis","doi":"10.3390/computers12100194","DOIUrl":"https://doi.org/10.3390/computers12100194","url":null,"abstract":"The primary objective of this study is to examine the factors that contribute to the early prediction of Massive Open Online Courses (MOOCs) dropouts in order to identify and support at-risk students. We utilize MOOC data of specific duration, with a guided study pace. The dataset exhibits class imbalance, and we apply oversampling techniques to ensure data balancing and unbiased prediction. We examine the predictive performance of five classic classification machine learning (ML) algorithms under four different oversampling techniques and various evaluation metrics. Additionally, we explore the influence of self-reported self-regulated learning (SRL) data provided by students and various other prominent features of MOOCs as potential indicators of early stage dropout prediction. The research questions focus on (1) the performance of the classic classification ML models using various evaluation metrics before and after different methods of oversampling, (2) which self-reported data may constitute crucial predictors for dropout propensity, and (3) the effect of the SRL factor on the dropout prediction performance. The main conclusions are: (1) prominent predictors, including employment status, frequency of chat tool usage, prior subject-related experiences, gender, education, and willingness to participate, exhibit remarkable efficacy in achieving high to excellent recall performance, particularly when specific combinations of algorithms and oversampling methods are applied, (2) self-reported SRL factor, combined with easily provided/self-reported features, performed well as a predictor in terms of recall when LR and SVM algorithms were employed, (3) it is crucial to test diverse machine learning algorithms and oversampling methods in predictive modeling.","PeriodicalId":46292,"journal":{"name":"Computers","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135538828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-26DOI: 10.3390/computers12100193
Aikaterini Georgiadou, Stelios Xinogalos
Nowadays, young students spend a lot of time playing video games and browsing on the Internet. Using the Internet has become even more widespread for young students due to the COVID-19 pandemic lockdown, which resulted in transferring several educational activities online. The Internet and generally the digital world that we live in offers many possibilities in our everyday lives, but it also entails dangers such as cyber threats and unethical use of personal data. It is widely accepted that everyone, especially young students, should be educated on safe Internet use and should be supported on acquiring other Digital Intelligence (DI) competencies as well. Towards this goal, we present the design and evaluation of the game “Follow the Paws” that aims to educate primary school students on safe Internet use and support them in acquiring relevant DI competencies. The game was designed taking into account relevant literature and was evaluated by 213 prospective Information and Communication Technology (ICT) teachers. The participants playtested the game and evaluated it through an online questionnaire that was based on validated instruments proposed in the literature. The participants evaluated positively to the didactic utility of the game and the anticipated player experience, while they highlighted several improvements to be taken into consideration in a future revision of the game. Based on the results, proposals for further research are presented, including DI competencies detection through the game and evaluating its actual effectiveness in the classroom.
{"title":"Prospective ICT Teachers’ Perceptions on the Didactic Utility and Player Experience of a Serious Game for Safe Internet Use and Digital Intelligence Competencies","authors":"Aikaterini Georgiadou, Stelios Xinogalos","doi":"10.3390/computers12100193","DOIUrl":"https://doi.org/10.3390/computers12100193","url":null,"abstract":"Nowadays, young students spend a lot of time playing video games and browsing on the Internet. Using the Internet has become even more widespread for young students due to the COVID-19 pandemic lockdown, which resulted in transferring several educational activities online. The Internet and generally the digital world that we live in offers many possibilities in our everyday lives, but it also entails dangers such as cyber threats and unethical use of personal data. It is widely accepted that everyone, especially young students, should be educated on safe Internet use and should be supported on acquiring other Digital Intelligence (DI) competencies as well. Towards this goal, we present the design and evaluation of the game “Follow the Paws” that aims to educate primary school students on safe Internet use and support them in acquiring relevant DI competencies. The game was designed taking into account relevant literature and was evaluated by 213 prospective Information and Communication Technology (ICT) teachers. The participants playtested the game and evaluated it through an online questionnaire that was based on validated instruments proposed in the literature. The participants evaluated positively to the didactic utility of the game and the anticipated player experience, while they highlighted several improvements to be taken into consideration in a future revision of the game. Based on the results, proposals for further research are presented, including DI competencies detection through the game and evaluating its actual effectiveness in the classroom.","PeriodicalId":46292,"journal":{"name":"Computers","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134961135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-24DOI: 10.3390/computers12100192
Oleg Sychev, Mikhail Denisov
Control-flow statements often cause misunderstandings among novice computer science students. To better address these problems, teachers need to know the misconceptions that are typical at this stage. In this paper, we present the results of studying students’ misconceptions about control-flow statements. We compiled 181 questions, each containing an algorithm written in pseudocode and the execution trace of that algorithm. Some of the traces were correct; others contained highlighted errors. The students were asked to explain in their own words why the selected line of the trace was correct or erroneous. We collected and processed 10,799 answers from 67 CS1 students. Among the 24 misconceptions we found, 6 coincided with misconceptions from other studies, and 7 were narrower cases of known misconceptions. We did not find previous research regarding 11 of the misconceptions we identified.
{"title":"Explain Trace: Misconceptions of Control-Flow Statements","authors":"Oleg Sychev, Mikhail Denisov","doi":"10.3390/computers12100192","DOIUrl":"https://doi.org/10.3390/computers12100192","url":null,"abstract":"Control-flow statements often cause misunderstandings among novice computer science students. To better address these problems, teachers need to know the misconceptions that are typical at this stage. In this paper, we present the results of studying students’ misconceptions about control-flow statements. We compiled 181 questions, each containing an algorithm written in pseudocode and the execution trace of that algorithm. Some of the traces were correct; others contained highlighted errors. The students were asked to explain in their own words why the selected line of the trace was correct or erroneous. We collected and processed 10,799 answers from 67 CS1 students. Among the 24 misconceptions we found, 6 coincided with misconceptions from other studies, and 7 were narrower cases of known misconceptions. We did not find previous research regarding 11 of the misconceptions we identified.","PeriodicalId":46292,"journal":{"name":"Computers","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135925657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-23DOI: 10.3390/computers12100191
Nirmalya Thakur, Yuvraj Nihal Duggal, Zihui Liu
In the last decade and a half, the world has experienced outbreaks of a range of viruses such as COVID-19, H1N1, flu, Ebola, Zika virus, Middle East Respiratory Syndrome (MERS), measles, and West Nile virus, just to name a few. During these virus outbreaks, the usage and effectiveness of social media platforms increased significantly, as such platforms served as virtual communities, enabling their users to share and exchange information, news, perspectives, opinions, ideas, and comments related to the outbreaks. Analysis of this Big Data of conversations related to virus outbreaks using concepts of Natural Language Processing such as Topic Modeling has attracted the attention of researchers from different disciplines such as Healthcare, Epidemiology, Data Science, Medicine, and Computer Science. The recent outbreak of the MPox virus has resulted in a tremendous increase in the usage of Twitter. Prior works in this area of research have primarily focused on the sentiment analysis and content analysis of these Tweets, and the few works that have focused on topic modeling have multiple limitations. This paper aims to address this research gap and makes two scientific contributions to this field. First, it presents the results of performing Topic Modeling on 601,432 Tweets about the 2022 Mpox outbreak that were posted on Twitter between 7 May 2022 and 3 March 2023. The results indicate that the conversations on Twitter related to Mpox during this time range may be broadly categorized into four distinct themes—Views and Perspectives about Mpox, Updates on Cases and Investigations about Mpox, Mpox and the LGBTQIA+ Community, and Mpox and COVID-19. Second, the paper presents the findings from the analysis of these Tweets. The results show that the theme that was most popular on Twitter (in terms of the number of Tweets posted) during this time range was Views and Perspectives about Mpox. This was followed by the theme of Mpox and the LGBTQIA+ Community, which was followed by the themes of Mpox and COVID-19 and Updates on Cases and Investigations about Mpox, respectively. Finally, a comparison with related studies in this area of research is also presented to highlight the novelty and significance of this research work.
在过去的15年里,世界经历了一系列病毒的爆发,如COVID-19, H1N1,流感,埃博拉病毒,寨卡病毒,中东呼吸综合征(MERS),麻疹和西尼罗河病毒,仅举几例。在这些病毒爆发期间,社交媒体平台的使用率和有效性显著提高,因为这些平台作为虚拟社区,使用户能够分享和交换与疫情有关的信息、新闻、观点、意见、想法和评论。利用主题建模等自然语言处理的概念对与病毒爆发相关的对话大数据进行分析,吸引了来自医疗保健、流行病学、数据科学、医学和计算机科学等不同学科的研究人员的注意。最近爆发的MPox病毒导致Twitter的使用量大幅增加。在这一研究领域的先前工作主要集中在这些tweet的情感分析和内容分析上,少数专注于主题建模的工作存在多重局限性。本文旨在填补这一研究空白,并在这一领域做出两项科学贡献。首先,它展示了对2022年5月7日至2023年3月3日期间在Twitter上发布的601,432条关于2022年Mpox爆发的推文进行主题建模的结果。结果表明,在这段时间内,Twitter上与Mpox相关的对话可以大致分为四个不同的主题:关于Mpox的观点和观点,关于Mpox的病例和调查的更新,Mpox与LGBTQIA+社区,以及Mpox与COVID-19。其次,本文给出了对这些推文的分析结果。结果显示,在这段时间内,Twitter上最受欢迎的主题(就Tweets发布的数量而言)是关于Mpox的Views and Perspectives。紧随其后的主题是Mpox和LGBTQIA+社区,紧随其后的主题分别是Mpox和COVID-19以及Mpox病例和调查最新情况。最后,对该领域的相关研究进行了比较,以突出本研究的新颖性和意义。
{"title":"Analyzing Public Reactions, Perceptions, and Attitudes during the MPox Outbreak: Findings from Topic Modeling of Tweets","authors":"Nirmalya Thakur, Yuvraj Nihal Duggal, Zihui Liu","doi":"10.3390/computers12100191","DOIUrl":"https://doi.org/10.3390/computers12100191","url":null,"abstract":"In the last decade and a half, the world has experienced outbreaks of a range of viruses such as COVID-19, H1N1, flu, Ebola, Zika virus, Middle East Respiratory Syndrome (MERS), measles, and West Nile virus, just to name a few. During these virus outbreaks, the usage and effectiveness of social media platforms increased significantly, as such platforms served as virtual communities, enabling their users to share and exchange information, news, perspectives, opinions, ideas, and comments related to the outbreaks. Analysis of this Big Data of conversations related to virus outbreaks using concepts of Natural Language Processing such as Topic Modeling has attracted the attention of researchers from different disciplines such as Healthcare, Epidemiology, Data Science, Medicine, and Computer Science. The recent outbreak of the MPox virus has resulted in a tremendous increase in the usage of Twitter. Prior works in this area of research have primarily focused on the sentiment analysis and content analysis of these Tweets, and the few works that have focused on topic modeling have multiple limitations. This paper aims to address this research gap and makes two scientific contributions to this field. First, it presents the results of performing Topic Modeling on 601,432 Tweets about the 2022 Mpox outbreak that were posted on Twitter between 7 May 2022 and 3 March 2023. The results indicate that the conversations on Twitter related to Mpox during this time range may be broadly categorized into four distinct themes—Views and Perspectives about Mpox, Updates on Cases and Investigations about Mpox, Mpox and the LGBTQIA+ Community, and Mpox and COVID-19. Second, the paper presents the findings from the analysis of these Tweets. The results show that the theme that was most popular on Twitter (in terms of the number of Tweets posted) during this time range was Views and Perspectives about Mpox. This was followed by the theme of Mpox and the LGBTQIA+ Community, which was followed by the themes of Mpox and COVID-19 and Updates on Cases and Investigations about Mpox, respectively. Finally, a comparison with related studies in this area of research is also presented to highlight the novelty and significance of this research work.","PeriodicalId":46292,"journal":{"name":"Computers","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135967130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}