Pub Date : 2025-09-16eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3185
Sen Chen, Junke Li
With the discovery of electricity and the widespread adoption of lighting technology, the extensive application of electricity has greatly increased productivity, making night-time factory production possible. At the same time, the rapid expansion of factories has led to a significant increase in particulate matter 2.5 (PM2.5) in the air. However, economic development heavily relies on lighting and factory production. To address this issue, researchers have focused on predicting urban gross domestic product (GDP) through night-time lights and PM2.5, but current studies often focus on the impact of a single factor on GDP, leaving room for improvement in model accuracy. In response to this problem, this article proposes the Relationship and Prediction Model between Night Light Data, PM2.5, and Urban GDP (R&P-NLPG model). Firstly, night light data, PM2.5 data, and GDP data are collected and preprocessed. Secondly, correlation analysis is conducted to analyze the correlation between data features. Then, data fusion methods are used to integrate features between night-time data and PM2.5 data, forming the third data features. Next, a neural network is constructed to establish a functional relationship between features and GDP. Finally, the trained neural network model is used to predict GDP. The experimental results demonstrate that the predictive capability of the R&P-NLPG model outperforms GDP prediction models constructed with single-feature input and existing multi-feature input.
{"title":"Research on the relationship and prediction model between nighttime lighting data, pm2.5 data, and urban GDP.","authors":"Sen Chen, Junke Li","doi":"10.7717/peerj-cs.3185","DOIUrl":"10.7717/peerj-cs.3185","url":null,"abstract":"<p><p>With the discovery of electricity and the widespread adoption of lighting technology, the extensive application of electricity has greatly increased productivity, making night-time factory production possible. At the same time, the rapid expansion of factories has led to a significant increase in particulate matter 2.5 (PM2.5) in the air. However, economic development heavily relies on lighting and factory production. To address this issue, researchers have focused on predicting urban gross domestic product (GDP) through night-time lights and PM2.5, but current studies often focus on the impact of a single factor on GDP, leaving room for improvement in model accuracy. In response to this problem, this article proposes the Relationship and Prediction Model between Night Light Data, PM2.5, and Urban GDP (R&P-NLPG model). Firstly, night light data, PM2.5 data, and GDP data are collected and preprocessed. Secondly, correlation analysis is conducted to analyze the correlation between data features. Then, data fusion methods are used to integrate features between night-time data and PM2.5 data, forming the third data features. Next, a neural network is constructed to establish a functional relationship between features and GDP. Finally, the trained neural network model is used to predict GDP. The experimental results demonstrate that the predictive capability of the R&P-NLPG model outperforms GDP prediction models constructed with single-feature input and existing multi-feature input.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3185"},"PeriodicalIF":2.5,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453825/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-16eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3165
Swetha Ghanta, Prasanthi Boyapati, Sujit Biswas, Ashok K Pradhan, Saraju P Mohanty
Brain tumor diagnosis using magnetic resonance imaging (MRI) scans is critical for improving patient survival rates. However, automating the analysis of these scans faces significant challenges, including data privacy concerns and the scarcity of large, diverse datasets. A potential solution is federated learning (FL), which enables cooperative model training among multiple organizations without requiring the sharing of raw data; however, it faces various challenges. To address these, we propose Federated Adaptive Reputation-aware aggregation with CKKS (Cheon-Kim-Kim-Song) Homomorphic encryption (FedARCH), a novel FL framework designed for a cross-silo scenario, where client weights are aggregated based on reputation scores derived from performance evaluations. Our framework incorporates a weighted aggregation method using these reputation scores to enhance the robustness of the global model. To address sudden changes in client performance, a smoothing factor is introduced, while a decay factor ensures that recent updates have a greater influence on the global model. These factors work together for dynamic performance management. Additionally, we address potential privacy risks from model inversion attacks by implementing a simplified and computationally efficient CKKS homomorphic encryption, which allows secure operations on encrypted data. With FedARCH, encrypted model weights of each client are multiplied by a plaintext reputation score for weighted aggregation. Since we are multiplying ciphertexts by plaintexts, instead of ciphertexts, the need for relinearization is eliminated, efficiently reducing the computational overhead. FedARCH achieved an accuracy of 99.39%, highlighting its potential in distinguishing between brain tumor classes. Several experiments were conducted by adding noise to the clients' data and varying the number of noisy clients. An accuracy of 94% was maintained even with 50% of noisy clients at a high noise level, while the standard FL approach accuracy dropped to 33%. Our results and the security analysis demonstrate the effectiveness of FedARCH in improving model accuracy, its robustness to noisy data, and its ability to ensure data privacy, making it a viable approach for medical image analysis in federated settings.
{"title":"Enhancing privacy-preserving brain tumor classification with adaptive reputation-aware federated learning and homomorphic encryption.","authors":"Swetha Ghanta, Prasanthi Boyapati, Sujit Biswas, Ashok K Pradhan, Saraju P Mohanty","doi":"10.7717/peerj-cs.3165","DOIUrl":"10.7717/peerj-cs.3165","url":null,"abstract":"<p><p>Brain tumor diagnosis using magnetic resonance imaging (MRI) scans is critical for improving patient survival rates. However, automating the analysis of these scans faces significant challenges, including data privacy concerns and the scarcity of large, diverse datasets. A potential solution is federated learning (FL), which enables cooperative model training among multiple organizations without requiring the sharing of raw data; however, it faces various challenges. To address these, we propose Federated Adaptive Reputation-aware aggregation with CKKS (Cheon-Kim-Kim-Song) Homomorphic encryption (FedARCH), a novel FL framework designed for a cross-silo scenario, where client weights are aggregated based on reputation scores derived from performance evaluations. Our framework incorporates a weighted aggregation method using these reputation scores to enhance the robustness of the global model. To address sudden changes in client performance, a smoothing factor is introduced, while a decay factor ensures that recent updates have a greater influence on the global model. These factors work together for dynamic performance management. Additionally, we address potential privacy risks from model inversion attacks by implementing a simplified and computationally efficient CKKS homomorphic encryption, which allows secure operations on encrypted data. With FedARCH, encrypted model weights of each client are multiplied by a plaintext reputation score for weighted aggregation. Since we are multiplying ciphertexts by plaintexts, instead of ciphertexts, the need for relinearization is eliminated, efficiently reducing the computational overhead. FedARCH achieved an accuracy of 99.39%, highlighting its potential in distinguishing between brain tumor classes. Several experiments were conducted by adding noise to the clients' data and varying the number of noisy clients. An accuracy of 94% was maintained even with 50% of noisy clients at a high noise level, while the standard FL approach accuracy dropped to 33%. Our results and the security analysis demonstrate the effectiveness of FedARCH in improving model accuracy, its robustness to noisy data, and its ability to ensure data privacy, making it a viable approach for medical image analysis in federated settings.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3165"},"PeriodicalIF":2.5,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453705/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-16eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3156
Yashashree Mahale, Nida Khan, Kunal Kulkarni, Shilpa Gite, Biswajeet Pradhan, Abdullah Alamri, Chang-Wook Lee, Nandhini K, Mrinal Bachute
Image processing and computer vision highly rely on data augmentation in machine learning models to increase the diversity and variability within training datasets for better performance. One of the most promising and widely used applications of data augmentation is in classifying waste object images. This research focuses on augmenting waste object images with generative adversarial networks (GANS). Here deep convolutional GAN (DCGAN), an extension of GAN is utilized, which uses convolutional and convolutional-transpose layers for better image generation. This approach helps generate realism and variability in images. Furthermore, object detection and classification techniques are used. By utilizing ensemble learning techniques with DenseNet121, ConvNext, and Resnet101, the network can accurately identify and classify waste objects in images, thereby contributing to improved waste management practices and environmental sustainability. With ensemble learning, a notable accuracy of 99.80% was achieved. Thus, by investigating the effectiveness of these models in conjunction with data augmentation techniques, this novel approach of GAN-based augmentation cooperated with ensemble models aims to provide valuable insights into optimizing waste object identification processes for real-world applications. Future work will focus on better data augmentation methods with other types of GANS architectures and introducing multimodal sources of data to further increase the performance of the classification and detection models.
{"title":"A comprehensive approach for waste management with GAN-augmented classification.","authors":"Yashashree Mahale, Nida Khan, Kunal Kulkarni, Shilpa Gite, Biswajeet Pradhan, Abdullah Alamri, Chang-Wook Lee, Nandhini K, Mrinal Bachute","doi":"10.7717/peerj-cs.3156","DOIUrl":"10.7717/peerj-cs.3156","url":null,"abstract":"<p><p>Image processing and computer vision highly rely on data augmentation in machine learning models to increase the diversity and variability within training datasets for better performance. One of the most promising and widely used applications of data augmentation is in classifying waste object images. This research focuses on augmenting waste object images with generative adversarial networks (GANS). Here deep convolutional GAN (DCGAN), an extension of GAN is utilized, which uses convolutional and convolutional-transpose layers for better image generation. This approach helps generate realism and variability in images. Furthermore, object detection and classification techniques are used. By utilizing ensemble learning techniques with <i>DenseNet121, ConvNext, and Resnet101</i>, the network can accurately identify and classify waste objects in images, thereby contributing to improved waste management practices and environmental sustainability. With ensemble learning, a notable accuracy of 99.80% was achieved. Thus, by investigating the effectiveness of these models in conjunction with data augmentation techniques, this novel approach of GAN-based augmentation cooperated with ensemble models aims to provide valuable insights into optimizing waste object identification processes for real-world applications. Future work will focus on better data augmentation methods with other types of GANS architectures and introducing multimodal sources of data to further increase the performance of the classification and detection models.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3156"},"PeriodicalIF":2.5,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453751/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-16eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3169
Haitham Elwahsh, Ali Bakhiet, Tarek Khalifa, Julian Hoxha, Maazen Alsabaan, Mohamed I Ibrahem, Mahmoud Elwahsh, Engy El-Shafeiy
The escalating complexity of cyber threats in smart microgrids necessitates advanced detection frameworks to counter sophisticated attacks. Existing methods often underutilize optimization techniques like Harris hawks optimization (HHO) and struggle with class imbalance in cybersecurity datasets. This study proposes a novel framework integrating HHO with extreme gradient boosting (XGBoost) and a hybrid convolutional neural network with support vector machine (Cnn-SVM) to enhance cyber threat detection. Using the distributed denial of service (DDoS) botnet attack and KDD CUP99 datasets, the proposed models leverage HHO for hyperparameter optimization, achieving accuracies of 99.97% and 99.99%, respectively, alongside improved area under curve (AUC) metrics. These results highlight the framework's ability to capture complex nonlinearities and address class imbalance through RandomOverSampler. The findings demonstrate the potential of HHO-optimized models to advance automated threat detection, offering robust and scalable solutions for securing critical infrastructures.
{"title":"Hyperparameter optimization of XGBoost and hybrid CnnSVM for cyber threat detection using modified Harris hawks algorithm.","authors":"Haitham Elwahsh, Ali Bakhiet, Tarek Khalifa, Julian Hoxha, Maazen Alsabaan, Mohamed I Ibrahem, Mahmoud Elwahsh, Engy El-Shafeiy","doi":"10.7717/peerj-cs.3169","DOIUrl":"10.7717/peerj-cs.3169","url":null,"abstract":"<p><p>The escalating complexity of cyber threats in smart microgrids necessitates advanced detection frameworks to counter sophisticated attacks. Existing methods often underutilize optimization techniques like Harris hawks optimization (HHO) and struggle with class imbalance in cybersecurity datasets. This study proposes a novel framework integrating HHO with extreme gradient boosting (XGBoost) and a hybrid convolutional neural network with support vector machine (Cnn-SVM) to enhance cyber threat detection. Using the distributed denial of service (DDoS) botnet attack and KDD CUP99 datasets, the proposed models leverage HHO for hyperparameter optimization, achieving accuracies of 99.97% and 99.99%, respectively, alongside improved area under curve (AUC) metrics. These results highlight the framework's ability to capture complex nonlinearities and address class imbalance through RandomOverSampler. The findings demonstrate the potential of HHO-optimized models to advance automated threat detection, offering robust and scalable solutions for securing critical infrastructures.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3169"},"PeriodicalIF":2.5,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453716/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-16eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3161
Yeong Hyeon Kim, Donghoon Kim, Jin Young Youm, Jiyoon Won, Seola Kim, Woohyun Park, Yisak Kim, Dongheon Lee
Background: Reliable measurement of left ventricular mass (LVM) in echocardiography is essential for early detection of left ventricular dysfunction, coronary artery disease, and arrhythmia risk, yet growing patient volumes have created critical shortage of experts in echocardiography. Recent deep learning approaches reduce inter-operator variability but require large, fully labeled datasets for each standard view-an impractical demand in many clinical settings.
Methods: To overcome these limitations, we propose a heatmap-based point-estimation segmentation model trained via model-agnostic meta-learning (MAML) for few-shot LVM quantification across multiple echocardiographic views. Our framework adapts rapidly to new views by learning a shared representation and view-specific head performing K inner-loop updates, and then meta-updating in the outer loop. We used the EchoNet-LVH dataset for the PLAX view, the TMED-2 dataset for the PSAX view and the CAMUS dataset for both the apical 2-chamber and apical 4-chamber views under 1-, 5-, and 10-shot scenarios.
Results: As a result, the proposed MAML methods demonstrated comparable performance using mean distance error, mean angle error, successful distance error and spatial angular similarity in a few-shot setting compared to models trained with larger labeled datasets for each view of the echocardiogram.
{"title":"Quantification of left ventricular mass in multiple views of echocardiograms using model-agnostic meta learning in a few-shot setting.","authors":"Yeong Hyeon Kim, Donghoon Kim, Jin Young Youm, Jiyoon Won, Seola Kim, Woohyun Park, Yisak Kim, Dongheon Lee","doi":"10.7717/peerj-cs.3161","DOIUrl":"10.7717/peerj-cs.3161","url":null,"abstract":"<p><strong>Background: </strong>Reliable measurement of left ventricular mass (LVM) in echocardiography is essential for early detection of left ventricular dysfunction, coronary artery disease, and arrhythmia risk, yet growing patient volumes have created critical shortage of experts in echocardiography. Recent deep learning approaches reduce inter-operator variability but require large, fully labeled datasets for each standard view-an impractical demand in many clinical settings.</p><p><strong>Methods: </strong>To overcome these limitations, we propose a heatmap-based point-estimation segmentation model trained <i>via</i> model-agnostic meta-learning (MAML) for few-shot LVM quantification across multiple echocardiographic views. Our framework adapts rapidly to new views by learning a shared representation and view-specific head performing K inner-loop updates, and then meta-updating in the outer loop. We used the EchoNet-LVH dataset for the PLAX view, the TMED-2 dataset for the PSAX view and the CAMUS dataset for both the apical 2-chamber and apical 4-chamber views under 1-, 5-, and 10-shot scenarios.</p><p><strong>Results: </strong>As a result, the proposed MAML methods demonstrated comparable performance using mean distance error, mean angle error, successful distance error and spatial angular similarity in a few-shot setting compared to models trained with larger labeled datasets for each view of the echocardiogram.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3161"},"PeriodicalIF":2.5,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453733/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abnormal cardiac activity can lead to severe health complications, emphasizing the importance of timely diagnosis. It is essential to save lives if diseases are diagnosed in a reasonable timeframe. The intelligent telehealth system has the potential to transform the healthcare industry by continuously monitoring cardiac diseases remotely and non-invasively. A cloud-based telehealth system utilizing an Internet of Things (IoT)-enabled electrocardiogram (ECG) monitor gathers and analyzes ECG signals to predict cardiac complications and notify physicians in crises, facilitating prompt and precise diagnosis of cardiovascular disorders. Abnormal cardiac activity can lead to severe health complications, making early detection crucial for effective treatment. This study provides an efficient method based on deep learning convolutional neural network (CNN) and long short-term memory (LSTM) approaches to categorize and detect cardiovascular problems utilizing ECG data to increase classifications (referring to distinguishing between different ECG signal categories) and precision. Additionally, a threshold-based classifier is developed for the telehealth system's security and privacy to enable user identification (for selecting the correct user from a group) using ECG data. A data preprocessing and augmentation technique was applied to improve the data quality and quantity. The proposed LSTM model attained 99.5% accuracy in the classification of cardiac diseases and 98.6% accuracy in user authentication utilizing ECG signals. These results exhibit enhanced performance compared to conventional machine learning and convolutional neural network models.
{"title":"Deep learning based cardiac disorder classification and user authentication for smart healthcare system using ECG signals.","authors":"Tong Ding, Chenhe Liu, Jiasheng Zhang, Yibo Zhang, Cheng Ding","doi":"10.7717/peerj-cs.3082","DOIUrl":"10.7717/peerj-cs.3082","url":null,"abstract":"<p><p>Abnormal cardiac activity can lead to severe health complications, emphasizing the importance of timely diagnosis. It is essential to save lives if diseases are diagnosed in a reasonable timeframe. The intelligent telehealth system has the potential to transform the healthcare industry by continuously monitoring cardiac diseases remotely and non-invasively. A cloud-based telehealth system utilizing an Internet of Things (IoT)-enabled electrocardiogram (ECG) monitor gathers and analyzes ECG signals to predict cardiac complications and notify physicians in crises, facilitating prompt and precise diagnosis of cardiovascular disorders. Abnormal cardiac activity can lead to severe health complications, making early detection crucial for effective treatment. This study provides an efficient method based on deep learning convolutional neural network (CNN) and long short-term memory (LSTM) approaches to categorize and detect cardiovascular problems utilizing ECG data to increase classifications (referring to distinguishing between different ECG signal categories) and precision. Additionally, a threshold-based classifier is developed for the telehealth system's security and privacy to enable user identification (for selecting the correct user from a group) using ECG data. A data preprocessing and augmentation technique was applied to improve the data quality and quantity. The proposed LSTM model attained 99.5% accuracy in the classification of cardiac diseases and 98.6% accuracy in user authentication utilizing ECG signals. These results exhibit enhanced performance compared to conventional machine learning and convolutional neural network models.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3082"},"PeriodicalIF":2.5,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453811/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As a key natural language processing (NLP) task, question generation (QG) is crucial for boosting educational quality and fostering personalized learning. This article offers an in-depth review of the research advancements and future directions in QG in education (QGEd). We start by tracing the evolution of QG and QGEd. Next, we explore the current state of QGEd research through three dimensions: its three core objectives, commonly used datasets, and question quality evaluation methods. This article also underscores its unique contributions to QGEd, including a systematic analysis of the research landscape and an identification of pivotal challenges and opportunities. Lastly, we highlight future research directions, emphasizing the need for deeper exploration in QGEd regarding multimodal data processing, controllability of fine-grained cognitive and difficulty levels, specialized educational dataset construction, automatic evaluation technology development, and system architecture design. Overall, this review aims to provide a comprehensive overview of the field, offering valuable insights for researchers and practitioners in educational technology.
{"title":"A literature review of research on question generation in education.","authors":"Xiaohui Dong, Xinyu Zhang, Zhengluo Li, Quanxin Hou, Jixiang Xue, Xiaoyi Li","doi":"10.7717/peerj-cs.3203","DOIUrl":"10.7717/peerj-cs.3203","url":null,"abstract":"<p><p>As a key natural language processing (NLP) task, question generation (QG) is crucial for boosting educational quality and fostering personalized learning. This article offers an in-depth review of the research advancements and future directions in QG in education (QGEd). We start by tracing the evolution of QG and QGEd. Next, we explore the current state of QGEd research through three dimensions: its three core objectives, commonly used datasets, and question quality evaluation methods. This article also underscores its unique contributions to QGEd, including a systematic analysis of the research landscape and an identification of pivotal challenges and opportunities. Lastly, we highlight future research directions, emphasizing the need for deeper exploration in QGEd regarding multimodal data processing, controllability of fine-grained cognitive and difficulty levels, specialized educational dataset construction, automatic evaluation technology development, and system architecture design. Overall, this review aims to provide a comprehensive overview of the field, offering valuable insights for researchers and practitioners in educational technology.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3203"},"PeriodicalIF":2.5,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453861/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-16eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3175
Qing Yun
In the era of informational ascendancy, the discourse of artistic communication has transcended the confines of conventional physical domains and geographical boundaries, extending its purview ubiquitously across the global expanse. Consequently, the predominant mode of artistic interaction has evolved towards swift and extensive engagement through virtual platforms. However, this paradigm shift has given rise to the imperative task of meticulous categorization and labeling of an extensive corpus of artistic works, demanding substantial temporal and human resources. This article introduces an innovative bimodal time series classification model (BTSCM) network for the purpose of categorizing and labeling artworks on virtual platforms. Rooted in the foundational principles of visual communication and leveraging multimedia fusion technology, the proposed model proves instrumental in discerning categories within the realm of video content. The BTSCM framework initiates the classification of video data into constituent image and sound elements, employing the conceptual framework of visual communication. Subsequently, feature extraction for both forms of information is achieved through the application of Inflated 3D ConvNet and Mel frequency cepstrum coefficient (MFCC). The synthesis of these extracted features is orchestrated through a fusion of fully convolutional network (FCN), deep Q-network (DQN), and long short-term memory (LSTM), collectively manifesting as the BTSCM network model. This amalgamated network, shaped by the union of fully convolutional network (FCN), DQN, and LSTM, adeptly conducts information processing, culminating in the realization of high-precision video classification. Experimental findings substantiate the efficacy of the BTSCM framework, as evidenced by outstanding classification results across diverse video classification datasets. The classification recognition rate on the self-established art platform exceeds 90%, surpassing benchmarks set by multiple multimodal fusion recognition networks. These commendable outcomes underscore the BTSCM framework's potential significance, providing a theoretical and methodological foundation for the prospective scrutiny and annotation of content within art creation platforms.
{"title":"Parametric art creation platform design based on visual delivery and multimedia data fusion.","authors":"Qing Yun","doi":"10.7717/peerj-cs.3175","DOIUrl":"10.7717/peerj-cs.3175","url":null,"abstract":"<p><p>In the era of informational ascendancy, the discourse of artistic communication has transcended the confines of conventional physical domains and geographical boundaries, extending its purview ubiquitously across the global expanse. Consequently, the predominant mode of artistic interaction has evolved towards swift and extensive engagement through virtual platforms. However, this paradigm shift has given rise to the imperative task of meticulous categorization and labeling of an extensive <i>corpus</i> of artistic works, demanding substantial temporal and human resources. This article introduces an innovative bimodal time series classification model (BTSCM) network for the purpose of categorizing and labeling artworks on virtual platforms. Rooted in the foundational principles of visual communication and leveraging multimedia fusion technology, the proposed model proves instrumental in discerning categories within the realm of video content. The BTSCM framework initiates the classification of video data into constituent image and sound elements, employing the conceptual framework of visual communication. Subsequently, feature extraction for both forms of information is achieved through the application of Inflated 3D ConvNet and Mel frequency cepstrum coefficient (MFCC). The synthesis of these extracted features is orchestrated through a fusion of fully convolutional network (FCN), deep Q-network (DQN), and long short-term memory (LSTM), collectively manifesting as the BTSCM network model. This amalgamated network, shaped by the union of fully convolutional network (FCN), DQN, and LSTM, adeptly conducts information processing, culminating in the realization of high-precision video classification. Experimental findings substantiate the efficacy of the BTSCM framework, as evidenced by outstanding classification results across diverse video classification datasets. The classification recognition rate on the self-established art platform exceeds 90%, surpassing benchmarks set by multiple multimodal fusion recognition networks. These commendable outcomes underscore the BTSCM framework's potential significance, providing a theoretical and methodological foundation for the prospective scrutiny and annotation of content within art creation platforms.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3175"},"PeriodicalIF":2.5,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453805/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-16eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3180
Alaa Altheneyan, Aseel Alhadlaq
Feature selection is essential for enhancing the performance and reducing the complexity of speech emotion recognition models. This article evaluates various feature selection methods, including correlation-based (CB), mutual information (MI), and recursive feature elimination (RFE), against baseline approaches using three different feature sets: (1) all available features (Mel-frequency cepstral coefficients (MFCC), root mean square energy (RMS), zero crossing rate (ZCR), chromagram, spectral centroid frequency (SCF), Tonnetz, Mel spectrogram, and spectral bandwidth), totaling 170 features; (2) a five-feature subset (MFCC, RMS, ZCR, Chromagram, and Mel spectrogram), totaling 163 features; and (3) a six-feature subset (MFCC, RMS, ZCR, SCF, Tonnetz, and Mel spectrogram), totaling 157 features. Methods are compared based on precision, recall, F1-score, accuracy, and the number of features selected. Results show that using all features yields an accuracy of 61.42%, but often includes irrelevant data. MI with 120 features achieves the highest performance, with precision, recall, F1-score, and accuracy at 65%, 65%, 65%, and 64.71%, respectively. CB methods with moderate thresholds also perform well, balancing simplicity and accuracy. RFE methods improve consistently with more features, stabilizing around 120 features.
{"title":"Feature selection for emotion recognition in speech: a comparative study of filter and wrapper methods.","authors":"Alaa Altheneyan, Aseel Alhadlaq","doi":"10.7717/peerj-cs.3180","DOIUrl":"10.7717/peerj-cs.3180","url":null,"abstract":"<p><p>Feature selection is essential for enhancing the performance and reducing the complexity of speech emotion recognition models. This article evaluates various feature selection methods, including correlation-based (CB), mutual information (MI), and recursive feature elimination (RFE), against baseline approaches using three different feature sets: (1) all available features (Mel-frequency cepstral coefficients (MFCC), root mean square energy (RMS), zero crossing rate (ZCR), chromagram, spectral centroid frequency (SCF), Tonnetz, Mel spectrogram, and spectral bandwidth), totaling 170 features; (2) a five-feature subset (MFCC, RMS, ZCR, Chromagram, and Mel spectrogram), totaling 163 features; and (3) a six-feature subset (MFCC, RMS, ZCR, SCF, Tonnetz, and Mel spectrogram), totaling 157 features. Methods are compared based on precision, recall, F1-score, accuracy, and the number of features selected. Results show that using all features yields an accuracy of 61.42%, but often includes irrelevant data. MI with 120 features achieves the highest performance, with precision, recall, F1-score, and accuracy at 65%, 65%, 65%, and 64.71%, respectively. CB methods with moderate thresholds also perform well, balancing simplicity and accuracy. RFE methods improve consistently with more features, stabilizing around 120 features.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3180"},"PeriodicalIF":2.5,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-15eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3137
Luis Augusto Silva Zendron, Paulo Jorge Coelho, Christophe Soares, Ivo Pereira, Ivan Miguel Pires
The domain of Human Activity Recognition (HAR) has undergone a remarkable evolution, driven by advancements in sensor technology, artificial intelligence (AI), and machine learning algorithms. The aim of this article consists of taking as a basis the previously obtained results to implement other techniques to analyze the same dataset and improve the results previously obtained in the different studies, such as neural networks with different configurations, random forest, support vector machine, CN2 rule inducer, Naive Bayes, and AdaBoost. The methodology consists of data collection from smartphone sensors, data cleaning and normalization, feature extraction techniques, and the implementation of various machine learning models. The study analyzed machine learning models for recognizing human activities using data from smartphone sensors. The results showed that the neural network and random forest models were highly effective across multiple metrics. The models achieved an area under the curve (AUC) of 98.42%, a classification accuracy of 90.14%, an F1-score of 90.13%, a precision of 90.18%, and a recall of 90.14%. With significantly reduced computational cost, our approach outperforms earlier models using the same dataset and achieves results comparable to those of contemporary deep learning-based approaches. Unlike prior studies, our work utilizes non-normalized data and integrates magnetometer signals to enhance performance, all while employing lightweight models within a reproducible visual workflow. This approach is novel, efficient, and deployable on mobile devices in real-time. This approach makes it an ideal fit for real-time mobile applications.
{"title":"Enhancing human activity recognition with machine learning: insights from smartphone accelerometer and magnetometer data.","authors":"Luis Augusto Silva Zendron, Paulo Jorge Coelho, Christophe Soares, Ivo Pereira, Ivan Miguel Pires","doi":"10.7717/peerj-cs.3137","DOIUrl":"10.7717/peerj-cs.3137","url":null,"abstract":"<p><p>The domain of Human Activity Recognition (HAR) has undergone a remarkable evolution, driven by advancements in sensor technology, artificial intelligence (AI), and machine learning algorithms. The aim of this article consists of taking as a basis the previously obtained results to implement other techniques to analyze the same dataset and improve the results previously obtained in the different studies, such as neural networks with different configurations, random forest, support vector machine, CN2 rule inducer, Naive Bayes, and AdaBoost. The methodology consists of data collection from smartphone sensors, data cleaning and normalization, feature extraction techniques, and the implementation of various machine learning models. The study analyzed machine learning models for recognizing human activities using data from smartphone sensors. The results showed that the neural network and random forest models were highly effective across multiple metrics. The models achieved an area under the curve (AUC) of 98.42%, a classification accuracy of 90.14%, an F1-score of 90.13%, a precision of 90.18%, and a recall of 90.14%. With significantly reduced computational cost, our approach outperforms earlier models using the same dataset and achieves results comparable to those of contemporary deep learning-based approaches. Unlike prior studies, our work utilizes non-normalized data and integrates magnetometer signals to enhance performance, all while employing lightweight models within a reproducible visual workflow. This approach is novel, efficient, and deployable on mobile devices in real-time. This approach makes it an ideal fit for real-time mobile applications.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3137"},"PeriodicalIF":2.5,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453735/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}