Pub Date : 2025-09-10eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3173
Abdulsamet Aktas, Gorkem Serbes, Hamza Osman Ilhan
Traditional sperm morphology assessment relies on manual visual inspection or semi-automated computer-aided sperm analysis (CASA) systems, which often require labor-intensive pre-processing steps. While recent machine learning approaches, particularly convolutional neural networks (CNNs), have improved feature extraction from sperm images, achieving a fully automated and highly accurate system remains challenging due to the complexity of sperm morphology and the need for specialized image adjustments. This study presents a novel, end-to-end automated sperm morphology analysis framework based on vision transformers (ViTs), which processes raw sperm images from two benchmark datasets-Human Sperm Head Morphology (HuSHeM) and Sperm Morphology Image Data Set (SMIDS)-without manual pre-processing. We conducted an extensive hyperparameter optimization study across eight ViT variants, evaluating learning rates, optimization algorithms, and data augmentation scales. Our experiments demonstrated that data augmentation significantly enhances ViT performance by improving generalization, particularly in limited-data scenarios. A comparative analysis of CNNs, hybrid models, and pure ViTs revealed that transformer-based architectures consistently outperform traditional methods. The BEiT_Base model achieved state-of-the-art accuracies of 92.5% (SMIDS) and 93.52% (HuSHeM), surpassing prior CNN-based approaches by 1.63% and 1.42%, respectively. Statistical significance (p < 0.05, t-test) confirmed these improvements. Visualization techniques (Attention Maps, Grad-CAM) further validated ViTs' superior ability to capture long-range spatial dependencies and discriminative morphological features, such as head shape and tail integrity. Our work bridges a critical gap in reproductive medicine by delivering a scalable, fully automated solution that eliminates manual intervention while improving diagnostic accuracy. These findings underscore the potential of transformer-based models in clinical andrology, with implications for broader applications in biomedical image analysis.
{"title":"Unveiling the capabilities of vision transformers in sperm morphology analysis: a comparative evaluation.","authors":"Abdulsamet Aktas, Gorkem Serbes, Hamza Osman Ilhan","doi":"10.7717/peerj-cs.3173","DOIUrl":"10.7717/peerj-cs.3173","url":null,"abstract":"<p><p>Traditional sperm morphology assessment relies on manual visual inspection or semi-automated computer-aided sperm analysis (CASA) systems, which often require labor-intensive pre-processing steps. While recent machine learning approaches, particularly convolutional neural networks (CNNs), have improved feature extraction from sperm images, achieving a fully automated and highly accurate system remains challenging due to the complexity of sperm morphology and the need for specialized image adjustments. This study presents a novel, end-to-end automated sperm morphology analysis framework based on vision transformers (ViTs), which processes raw sperm images from two benchmark datasets-Human Sperm Head Morphology (HuSHeM) and Sperm Morphology Image Data Set (SMIDS)-without manual pre-processing. We conducted an extensive hyperparameter optimization study across eight ViT variants, evaluating learning rates, optimization algorithms, and data augmentation scales. Our experiments demonstrated that data augmentation significantly enhances ViT performance by improving generalization, particularly in limited-data scenarios. A comparative analysis of CNNs, hybrid models, and pure ViTs revealed that transformer-based architectures consistently outperform traditional methods. The BEiT_Base model achieved state-of-the-art accuracies of 92.5% (SMIDS) and 93.52% (HuSHeM), surpassing prior CNN-based approaches by 1.63% and 1.42%, respectively. Statistical significance (<i>p</i> < 0.05, <i>t</i>-test) confirmed these improvements. Visualization techniques (Attention Maps, Grad-CAM) further validated ViTs' superior ability to capture long-range spatial dependencies and discriminative morphological features, such as head shape and tail integrity. Our work bridges a critical gap in reproductive medicine by delivering a scalable, fully automated solution that eliminates manual intervention while improving diagnostic accuracy. These findings underscore the potential of transformer-based models in clinical andrology, with implications for broader applications in biomedical image analysis.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3173"},"PeriodicalIF":2.5,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453802/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-10eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3189
Abdalla Alameen, Sultan Mesfer Aldossary
Background: Accurate solar radiation prediction is essential for optimizing renewable energy systems but remains challenging due to data scarcity and variability. This study addresses these challenges by employing generative adversarial networks (GANs) to generate high-quality synthetic solar radiation data.
Methods: A novel framework was developed that integrates GAN-generated synthetic data with machine learning and deep learning models, including CNN-LSTM architectures. These models were trained and evaluated using augmented datasets to improve predictive accuracy and adaptability across diverse climatic zones.
Results: Models trained on augmented datasets exhibited significant improvements, with root mean square error (RMSE) reduced by 15.2% and mean absolute error (MAE) decreased by 19.9%. The framework effectively bridged data gaps and enhanced model generalization, enabling applicability across various climatic regions in Saudi Arabia.
Conclusions: The proposed framework facilitates practical applications such as photovoltaic system optimization, grid stability enhancement, and resource planning. By aligning with Saudi Arabia's Vision 2030 and global renewable energy objectives, this study presents a scalable and adaptable approach to advancing renewable energy systems. However, challenges such as computational complexity and hyperparameter sensitivity warrant further investigation, providing a robust pathway toward sustainable energy futures worldwide.
{"title":"A GAN-based approach to solar radiation prediction: data augmentation and model optimization for Saudi Arabia.","authors":"Abdalla Alameen, Sultan Mesfer Aldossary","doi":"10.7717/peerj-cs.3189","DOIUrl":"10.7717/peerj-cs.3189","url":null,"abstract":"<p><strong>Background: </strong>Accurate solar radiation prediction is essential for optimizing renewable energy systems but remains challenging due to data scarcity and variability. This study addresses these challenges by employing generative adversarial networks (GANs) to generate high-quality synthetic solar radiation data.</p><p><strong>Methods: </strong>A novel framework was developed that integrates GAN-generated synthetic data with machine learning and deep learning models, including CNN-LSTM architectures. These models were trained and evaluated using augmented datasets to improve predictive accuracy and adaptability across diverse climatic zones.</p><p><strong>Results: </strong>Models trained on augmented datasets exhibited significant improvements, with root mean square error (RMSE) reduced by 15.2% and mean absolute error (MAE) decreased by 19.9%. The framework effectively bridged data gaps and enhanced model generalization, enabling applicability across various climatic regions in Saudi Arabia.</p><p><strong>Conclusions: </strong>The proposed framework facilitates practical applications such as photovoltaic system optimization, grid stability enhancement, and resource planning. By aligning with Saudi Arabia's Vision 2030 and global renewable energy objectives, this study presents a scalable and adaptable approach to advancing renewable energy systems. However, challenges such as computational complexity and hyperparameter sensitivity warrant further investigation, providing a robust pathway toward sustainable energy futures worldwide.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3189"},"PeriodicalIF":2.5,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453800/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-09eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3112
Hamzah Luqman
Sign language recognition (SLR) plays a vital role in including people with hearing impairment in the community. It facilitates the recognition of sign gestures and converts them into spoken languages. One of the main challenges for developing SLR systems is the lack of annotated datasets. This issue is more noticeable with low-resourced sign languages. To address this issue, we propose a pretrained large vision model, SignVLM, for SLR. This work explores the capability of the contrastive language-image pre-training (CLIP) model for SLR. This model is used to extract spatial features from the sign video frames while a Transformer decoder is used for temporal learning. The proposed model has been evaluated on four different sign languages using the KArSL, WLASL, LSA64, and AUSTL datasets. Different evaluation settings have been followed in this work including zero-shot and few-shot learning. The proposed model outperformed other models on the KArSL, WLASL, and LSA64 datasets and achieved comparable performance on the AUTSL dataset. The obtained results demonstrate the generalization of the proposed model to new datasets with few samples. The code and data are available at https://github.com/Hamzah-Luqman/signVLM.
{"title":"SignVLM: a pre-trained large video model for sign language recognition.","authors":"Hamzah Luqman","doi":"10.7717/peerj-cs.3112","DOIUrl":"10.7717/peerj-cs.3112","url":null,"abstract":"<p><p>Sign language recognition (SLR) plays a vital role in including people with hearing impairment in the community. It facilitates the recognition of sign gestures and converts them into spoken languages. One of the main challenges for developing SLR systems is the lack of annotated datasets. This issue is more noticeable with low-resourced sign languages. To address this issue, we propose a pretrained large vision model, SignVLM, for SLR. This work explores the capability of the contrastive language-image pre-training (CLIP) model for SLR. This model is used to extract spatial features from the sign video frames while a Transformer decoder is used for temporal learning. The proposed model has been evaluated on four different sign languages using the KArSL, WLASL, LSA64, and AUSTL datasets. Different evaluation settings have been followed in this work including zero-shot and few-shot learning. The proposed model outperformed other models on the KArSL, WLASL, and LSA64 datasets and achieved comparable performance on the AUTSL dataset. The obtained results demonstrate the generalization of the proposed model to new datasets with few samples. The code and data are available at https://github.com/Hamzah-Luqman/signVLM.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3112"},"PeriodicalIF":2.5,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453763/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-09eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3188
Shiquan Zhang, Chaohong Gan
To accurately identify and evaluate tennis movements, a tennis auxiliary teaching system based on reinforcement learning and multi-feature fusion was designed by combining deep learning methods with tennis-related knowledge to recognize and evaluate tennis movements accurately. The algorithm first extracts human skeletal joint points from a video sequence using a human pose-recognition algorithm. Reinforcement learning is then used to extract and optimize the keyframes. Second, genetic algorithms were used to fuse the different features. The results demonstrate that the proposed tennis action recognition method achieves a classification accuracy of 98.45% for four types of tennis subactions. Its generalization ability is greater than that of graph convolutional network-based techniques, such as AGCN and ST-GCN. Lastly, following action categorization, the suggested scoring method based on dynamic temporal warping may deliver accurate and real-time assessment ratings for corresponding actions, lowering the effort of tennis instructors and significantly raising the standard of tennis instruction.
{"title":"Design of tennis auxiliary teaching system based on reinforcement learning and multi-feature fusion.","authors":"Shiquan Zhang, Chaohong Gan","doi":"10.7717/peerj-cs.3188","DOIUrl":"10.7717/peerj-cs.3188","url":null,"abstract":"<p><p>To accurately identify and evaluate tennis movements, a tennis auxiliary teaching system based on reinforcement learning and multi-feature fusion was designed by combining deep learning methods with tennis-related knowledge to recognize and evaluate tennis movements accurately. The algorithm first extracts human skeletal joint points from a video sequence using a human pose-recognition algorithm. Reinforcement learning is then used to extract and optimize the keyframes. Second, genetic algorithms were used to fuse the different features. The results demonstrate that the proposed tennis action recognition method achieves a classification accuracy of 98.45% for four types of tennis subactions. Its generalization ability is greater than that of graph convolutional network-based techniques, such as AGCN and ST-GCN. Lastly, following action categorization, the suggested scoring method based on dynamic temporal warping may deliver accurate and real-time assessment ratings for corresponding actions, lowering the effort of tennis instructors and significantly raising the standard of tennis instruction.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3188"},"PeriodicalIF":2.5,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453859/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-09eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3159
Francisco Ortin, Alonso Gago, Jose Quiroga, Miguel Garcia
The rapid expansion of online learning has made education more accessible but has also introduced significant challenges in maintaining academic integrity, particularly during online exams. For certain types of exams, students are prohibited from connecting to the Internet to prevent them from accessing unauthorized resources, utilizing generative artificial intelligence tools, or engaging in other forms of cheating. However, in online exams, students must remain connected to the Internet. Most existing online proctoring systems rely on various devices to monitor students' actions and environments during the exam, focusing on tracking physical behavior, such as facial expressions, eye movements, and the presence of unauthorized materials, rather than analyzing the students' work within their computers. This often requires human review to determine whether students are engaging in unauthorized actions. This article presents the development and evaluation of a machine-learning-based assistant designed to assist instructors in detecting fraudulent activities in real-time during online programming exams. Our system leverages a convolutional neural network (CNN) followed by a recurrent neural network (RNN) and a dense layer to analyze sequences of screenshot frames captured from students' screens during exams. The system achieves an accuracy of 95.18% and an F2-score of 94.2%, prioritizing recall to emphasize detecting cheating instances, while minimizing false positives. Notably, data augmentation and class-weight adjustments during training significantly enhanced the model's performance, while transfer learning and alternative loss functions did not provide additional improvements. In post-deployment feedback, instructors expressed high satisfaction with the system's ability to assist in the rapid detection of cheating, reinforcing the potential of machine learning to support real-time monitoring in large-scale online exams.
{"title":"A machine learning assistant for detecting fraudulent activities in synchronous online programming exams.","authors":"Francisco Ortin, Alonso Gago, Jose Quiroga, Miguel Garcia","doi":"10.7717/peerj-cs.3159","DOIUrl":"10.7717/peerj-cs.3159","url":null,"abstract":"<p><p>The rapid expansion of online learning has made education more accessible but has also introduced significant challenges in maintaining academic integrity, particularly during online exams. For certain types of exams, students are prohibited from connecting to the Internet to prevent them from accessing unauthorized resources, utilizing generative artificial intelligence tools, or engaging in other forms of cheating. However, in online exams, students must remain connected to the Internet. Most existing online proctoring systems rely on various devices to monitor students' actions and environments during the exam, focusing on tracking physical behavior, such as facial expressions, eye movements, and the presence of unauthorized materials, rather than analyzing the students' work within their computers. This often requires human review to determine whether students are engaging in unauthorized actions. This article presents the development and evaluation of a machine-learning-based assistant designed to assist instructors in detecting fraudulent activities in real-time during online programming exams. Our system leverages a convolutional neural network (CNN) followed by a recurrent neural network (RNN) and a dense layer to analyze sequences of screenshot frames captured from students' screens during exams. The system achieves an accuracy of 95.18% and an F<sub>2</sub>-score of 94.2%, prioritizing recall to emphasize detecting cheating instances, while minimizing false positives. Notably, data augmentation and class-weight adjustments during training significantly enhanced the model's performance, while transfer learning and alternative loss functions did not provide additional improvements. In post-deployment feedback, instructors expressed high satisfaction with the system's ability to assist in the rapid detection of cheating, reinforcing the potential of machine learning to support real-time monitoring in large-scale online exams.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3159"},"PeriodicalIF":2.5,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453727/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-09eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3198
Semih Demirel, Oktay Yıldız
The assessment of fruit freshness is crucial for ensuring food quality and reducing waste in agricultural production. In this study, we propose Global Response Normalization and Gaussian Error Linear Unit Enhanced Network (GGENet), a novel deep learning architecture that leverages adaptive knowledge distillation (AKD) and global response normalization (GRN) to classify fruits as fresh or rotten. Our model comprises two variants: GGENet-Teacher (GGENet-T), serving as the teacher model, and GGENet-Student (GGENet-S), functioning as the student model. By transferring attention maps from the teacher to the student model, we achieve efficient adaptive knowledge distillation, enhancing the performance of the lighter student model. Experimental results demonstrate that the GGENet with adaptive knowledge distillation (GGENet-AKD) achieves a competitive accuracy of 0.9818, an F1-score of 0.9818, and an area under the curve (AUC) score of 0.9891. The proposed method significantly contributes to reducing food waste and enhancing quality control in agriculture by facilitating early detection of rotting fruits.
{"title":"Enhancing fruit freshness classification with adaptive knowledge distillation and global response normalization in convolutional networks.","authors":"Semih Demirel, Oktay Yıldız","doi":"10.7717/peerj-cs.3198","DOIUrl":"10.7717/peerj-cs.3198","url":null,"abstract":"<p><p>The assessment of fruit freshness is crucial for ensuring food quality and reducing waste in agricultural production. In this study, we propose <i>Global Response Normalization and Gaussian Error Linear Unit Enhanced Network (GGENet)</i>, a novel deep learning architecture that leverages adaptive knowledge distillation (AKD) and global response normalization (GRN) to classify fruits as fresh or rotten. Our model comprises two variants: <i>GGENet-Teacher (GGENet-T)</i>, serving as the teacher model, and <i>GGENet-Student (GGENet-S)</i>, functioning as the student model. By transferring attention maps from the teacher to the student model, we achieve efficient adaptive knowledge distillation, enhancing the performance of the lighter student model. Experimental results demonstrate that the <i>GGENet with adaptive knowledge distillation (GGENet-AKD)</i> achieves a competitive accuracy of 0.9818, an F1-score of 0.9818, and an area under the curve (AUC) score of 0.9891. The proposed method significantly contributes to reducing food waste and enhancing quality control in agriculture by facilitating early detection of rotting fruits.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3198"},"PeriodicalIF":2.5,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453794/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Text-to-image generative models have recently garnered a significant surge due to their ability to produce diverse images based on given text prompts. However, concerns regarding the occasional generation of inappropriate, offensive, or explicit content have arisen. To address this, we propose a simple yet effective method that leverages attention map to selectively suppress inappropriate concepts during image generation. Unlike existing approaches that often sacrifice original image context or demand substantial computational overhead, our method preserves image integrity without requiring additional model training or extensive engineering effort. To evaluate our method, we conducted comprehensive quantitative assessments on inappropriateness reduction, text fidelity, image consistency, and computational cost, alongside an online human perceptual study involving 20 participants. The results from our statistical analysis demonstrated that our method effectively removes inappropriate content while preserving the integrity of the original images with high computational efficiency.
{"title":"Mitigating inappropriate concepts in text-to-image generation with attention-guided Image editing.","authors":"Jiyeon Oh, Jae-Yeop Jeong, Yeong-Gi Hong, Jin-Woo Jeong","doi":"10.7717/peerj-cs.3170","DOIUrl":"10.7717/peerj-cs.3170","url":null,"abstract":"<p><p>Text-to-image generative models have recently garnered a significant surge due to their ability to produce diverse images based on given text prompts. However, concerns regarding the occasional generation of inappropriate, offensive, or explicit content have arisen. To address this, we propose a simple yet effective method that leverages attention map to selectively suppress inappropriate concepts during image generation. Unlike existing approaches that often sacrifice original image context or demand substantial computational overhead, our method preserves image integrity without requiring additional model training or extensive engineering effort. To evaluate our method, we conducted comprehensive quantitative assessments on inappropriateness reduction, text fidelity, image consistency, and computational cost, alongside an online human perceptual study involving 20 participants. The results from our statistical analysis demonstrated that our method effectively removes inappropriate content while preserving the integrity of the original images with high computational efficiency.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3170"},"PeriodicalIF":2.5,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453712/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-08eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3160
Shams Ur Rehman, Robertas Damaševicius, Hassan Al Sukhni, Abeer Aljohani, Ameer Hamza, Deema Mohammed Alsekait, Diaa Salama AbdElminaam
Traditional diagnostic methods of leukemia, a blood cancer disease, are based on visual assessment of white cells in microscopic peripheral blood smears, and as a result, they are arbitrary, laborious, and susceptible to errors. This study proposes a new automated deep learning-based framework for accurately classifying leukemia cancer. A novel lightweight algorithm based on the hyperbolic sin function has been designed for contrast enhancement. In the next step, we proposed a customized convolutional neural network (CNN) model based on a parallel inverted dual self-attention network (PIDSAN4), and a tiny16 Vision Transformer (ViT) has been employed. The hyperparameters were tuned using the grey wolf optimization and then used to train the models. The experiment is carried out on a publicly available leukemia microscopic images dataset, and the proposed model achieved 0.913 accuracy, 0.892 sensitivity, 0.925 specificity, 0.883 precision, 0.894 F-measure, and 0.901 G-mean. The results were compared with state-of-the-art pre-trained models, showing that the proposed model improved accuracy.
{"title":"A novel deep learning based approach with hyperparameter selection using grey wolf optimization for leukemia classification and hematologic malignancy detection.","authors":"Shams Ur Rehman, Robertas Damaševicius, Hassan Al Sukhni, Abeer Aljohani, Ameer Hamza, Deema Mohammed Alsekait, Diaa Salama AbdElminaam","doi":"10.7717/peerj-cs.3160","DOIUrl":"10.7717/peerj-cs.3160","url":null,"abstract":"<p><p>Traditional diagnostic methods of leukemia, a blood cancer disease, are based on visual assessment of white cells in microscopic peripheral blood smears, and as a result, they are arbitrary, laborious, and susceptible to errors. This study proposes a new automated deep learning-based framework for accurately classifying leukemia cancer. A novel lightweight algorithm based on the hyperbolic sin function has been designed for contrast enhancement. In the next step, we proposed a customized convolutional neural network (CNN) model based on a parallel inverted dual self-attention network (PIDSAN4), and a tiny16 Vision Transformer (ViT) has been employed. The hyperparameters were tuned using the grey wolf optimization and then used to train the models. The experiment is carried out on a publicly available leukemia microscopic images dataset, and the proposed model achieved 0.913 accuracy, 0.892 sensitivity, 0.925 specificity, 0.883 precision, 0.894 F-measure, and 0.901 G-mean. The results were compared with state-of-the-art pre-trained models, showing that the proposed model improved accuracy.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3160"},"PeriodicalIF":2.5,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453765/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-08eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3181
Jingdong He, Qiang Shi, Jun Ma, Dacheng Shi, Tie Min
Medical image classification is essential for contemporary clinical diagnosis and decision support systems. However, medical images generally have similar inter-class features and complex structure patterns, making it a challenging task. While both local and global features are critical for noise reduction and discriminative pattern extraction in medical images, conventional approaches exhibit limitations. Specifically, convolutional neural networks (CNNs) focus on local features extraction but lack a comprehensive understanding of global semantic. Conversely, vision transformers (ViTs) can model long-range feature dependencies but may cause disruption to local features. To address these limitations, we propose Hierarchical Enhanced Multi-attention Feature (HEMF), an adaptive hierarchical enhanced multi-attention feature fusion framework to synergistically extract and fuse multi-scale local and global features. It comprises two core components: (1) the enhanced local and global feature extraction modules to extract multi-scale local and global features in parallel; (2) the hierarchical enhanced feature fusion module integrating a novel attention mechanism named Mixed Attention (MA) and a novel inverted residual block named Squeezed Inverted Residual Multi-Layer Perceptron (SIRMLP) to effectively fuse multi-scale features. Experimental results demonstrate that with nearly minimal model parameters compared to other advanced models, HEMF achieves the accuracy and F1-score of 87.34% and 78.89% on the ISIC2018 dataset, 87.03% and 87.02% on the Kvasir dataset, and 82.26% and 82.20% on the COVID-19 CT dataset, which are the state-of-the-art performance. Our code is open source and available from https://github.com/Esgjgd/HEMF.
{"title":"HEMF: an adaptive hierarchical enhanced multi-attention feature fusion framework for cross-scale medical image classification.","authors":"Jingdong He, Qiang Shi, Jun Ma, Dacheng Shi, Tie Min","doi":"10.7717/peerj-cs.3181","DOIUrl":"10.7717/peerj-cs.3181","url":null,"abstract":"<p><p>Medical image classification is essential for contemporary clinical diagnosis and decision support systems. However, medical images generally have similar inter-class features and complex structure patterns, making it a challenging task. While both local and global features are critical for noise reduction and discriminative pattern extraction in medical images, conventional approaches exhibit limitations. Specifically, convolutional neural networks (CNNs) focus on local features extraction but lack a comprehensive understanding of global semantic. Conversely, vision transformers (ViTs) can model long-range feature dependencies but may cause disruption to local features. To address these limitations, we propose Hierarchical Enhanced Multi-attention Feature (HEMF), an adaptive hierarchical enhanced multi-attention feature fusion framework to synergistically extract and fuse multi-scale local and global features. It comprises two core components: (1) the enhanced local and global feature extraction modules to extract multi-scale local and global features in parallel; (2) the hierarchical enhanced feature fusion module integrating a novel attention mechanism named Mixed Attention (MA) and a novel inverted residual block named Squeezed Inverted Residual Multi-Layer Perceptron (SIRMLP) to effectively fuse multi-scale features. Experimental results demonstrate that with nearly minimal model parameters compared to other advanced models, HEMF achieves the accuracy and F1-score of 87.34% and 78.89% on the ISIC2018 dataset, 87.03% and 87.02% on the Kvasir dataset, and 82.26% and 82.20% on the COVID-19 CT dataset, which are the state-of-the-art performance. Our code is open source and available from https://github.com/Esgjgd/HEMF.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3181"},"PeriodicalIF":2.5,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453837/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-08eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3087
Bilal I Al-Ahmad, Abdullah Alzaqebah, Rami Alkhawaldeh, Ala' M Al-Zoubi, Hsuehi Lo, Adel Ali
Predicting students' performance is one of the essential educational data mining approaches aimed at observing learning outcomes. Predicting grade point average (GPA) helps to monitor academic performance and assists advisors in identifying students at risk of failure, major changes, or dropout. To enhance prediction performance, this study employs a long short-term memory (LSTM) model using a rich set of academic and demographic features. The dataset, drawn from 29,455 students at Saint Cloud State University (SCSU) over eight years (2016-2024), was carefully preprocessed by eliminating irrelevant and missing data, encoding categorical variables, and normalizing numerical features. Feature importance was determined using a permutation-based method to identify the most impactful variables on term GPA prediction. Furthermore, model hyperparameters, including the number of LSTM layers, units per layer, batch size, learning rate, and activation functions, were fine-tuned using experimental validation with the Adam optimizer and learning rate scheduling. Two experiments were conducted at both the college and department levels. The proposed model outperformed traditional machine learning models such as linear regression (LR), K-nearest neighbor (KNN), decision tree (DT), random forest (RF), and support vector regressor (SVR), and it surpasses two deep learning models, recurrent neural network (RNN) and convolutional neural network (CNN), achieving 9.54 mean absolute percentage error (MAPE), 0.0059 mean absolute error (MAE), 0.0001 root mean square error (RMSE), and an R² score of 99%.
{"title":"Predicting academic performance for students' university: case study from Saint Cloud State University.","authors":"Bilal I Al-Ahmad, Abdullah Alzaqebah, Rami Alkhawaldeh, Ala' M Al-Zoubi, Hsuehi Lo, Adel Ali","doi":"10.7717/peerj-cs.3087","DOIUrl":"10.7717/peerj-cs.3087","url":null,"abstract":"<p><p>Predicting students' performance is one of the essential educational data mining approaches aimed at observing learning outcomes. Predicting grade point average (GPA) helps to monitor academic performance and assists advisors in identifying students at risk of failure, major changes, or dropout. To enhance prediction performance, this study employs a long short-term memory (LSTM) model using a rich set of academic and demographic features. The dataset, drawn from 29,455 students at Saint Cloud State University (SCSU) over eight years (2016-2024), was carefully preprocessed by eliminating irrelevant and missing data, encoding categorical variables, and normalizing numerical features. Feature importance was determined using a permutation-based method to identify the most impactful variables on term GPA prediction. Furthermore, model hyperparameters, including the number of LSTM layers, units per layer, batch size, learning rate, and activation functions, were fine-tuned using experimental validation with the Adam optimizer and learning rate scheduling. Two experiments were conducted at both the college and department levels. The proposed model outperformed traditional machine learning models such as linear regression (LR), K-nearest neighbor (KNN), decision tree (DT), random forest (RF), and support vector regressor (SVR), and it surpasses two deep learning models, recurrent neural network (RNN) and convolutional neural network (CNN), achieving 9.54 mean absolute percentage error (MAPE), 0.0059 mean absolute error (MAE), 0.0001 root mean square error (RMSE), and an R² score of 99%.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3087"},"PeriodicalIF":2.5,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453804/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}