Pub Date : 2025-09-03eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3151
Yang Liu, Shuaixian Liu, Jie Gao, Tao Song, Wenyu Dong
Detecting unsafe human behaviors is crucial for enhancing safety in industrial production environments. Current models face limitations in multi-scale target detection within such settings. This study introduces a novel model, Sec-YOLO, which is specifically designed for detecting unsafe behaviors. Firstly, the model incorporates a receptive-field attention convolution (RFAConv) module to better focus on the key features of unsafe behaviors. Secondly, a deformable convolution network v2 (DCNv2) is integrated into the C2f module to enhance the model's adaptability to the continually changing feature structures of unsafe behaviors. Additionally, inspired by the multi-branch auxiliary feature pyramid network (MAFPN) structure, the neck architecture of the model has been restructured. Importantly, to improve feature extraction and fusion, feature-enhanced hybrid attention (FEHA) is introduced and integrated with DCNv2 and MAFPN. Experimental results demonstrate that Sec-YOLO achieves a mean average precision (mAP) at 0.5 of 92.6% and mAP at 0.5:0.95 of 63.6% on a custom dataset comprising four common unsafe behaviors: falling, sleeping at the post, using mobile phones, and not wearing safety helmets. These results represent a 2.0% and 2.5% improvement over the YOLOv8n model. Sec-YOLO exhibits excellent performance in practical applications, focusing more precisely on feature handling and detection.
{"title":"Detection of unsafe workplace behaviors: Sec-YOLO model with FEHA attention.","authors":"Yang Liu, Shuaixian Liu, Jie Gao, Tao Song, Wenyu Dong","doi":"10.7717/peerj-cs.3151","DOIUrl":"10.7717/peerj-cs.3151","url":null,"abstract":"<p><p>Detecting unsafe human behaviors is crucial for enhancing safety in industrial production environments. Current models face limitations in multi-scale target detection within such settings. This study introduces a novel model, Sec-YOLO, which is specifically designed for detecting unsafe behaviors. Firstly, the model incorporates a receptive-field attention convolution (RFAConv) module to better focus on the key features of unsafe behaviors. Secondly, a deformable convolution network v2 (DCNv2) is integrated into the C2f module to enhance the model's adaptability to the continually changing feature structures of unsafe behaviors. Additionally, inspired by the multi-branch auxiliary feature pyramid network (MAFPN) structure, the neck architecture of the model has been restructured. Importantly, to improve feature extraction and fusion, feature-enhanced hybrid attention (FEHA) is introduced and integrated with DCNv2 and MAFPN. Experimental results demonstrate that Sec-YOLO achieves a mean average precision (mAP) at 0.5 of 92.6% and mAP at 0.5:0.95 of 63.6% on a custom dataset comprising four common unsafe behaviors: falling, sleeping at the post, using mobile phones, and not wearing safety helmets. These results represent a 2.0% and 2.5% improvement over the YOLOv8n model. Sec-YOLO exhibits excellent performance in practical applications, focusing more precisely on feature handling and detection.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3151"},"PeriodicalIF":2.5,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453777/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-03eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3049
Carolina Busco, Felipe González, Paula Farina, Jonathan Vivas, Fernanda Saavedra, Lizbeth Avalos
This research introduces the Municipal Digital Offering Index (MDOI) to assess municipal online service development in Chile. The study utilizes content analysis of municipal websites, creating a systematic instrument to evaluate digital services. It evaluates all 344 Chilean municipalities based on 163 dichotomous variables. Through factor analysis and regression modeling, it investigates sociodemographic and economic factors influencing digital development at the municipal level, offering insights into the digital divide across municipalities. The findings highlight geographical disparities and indicate priority intervention areas. While education levels and financial resources influence digital technology adoption, many municipalities lack efficient online procedures, prompting focused digital transformation investments. This research emphasizes the importance of localized digital services in bridging the digital divide and promoting inclusive governance.
{"title":"Introducing the municipal digital offering index for evaluating online services and addressing the digital divide.","authors":"Carolina Busco, Felipe González, Paula Farina, Jonathan Vivas, Fernanda Saavedra, Lizbeth Avalos","doi":"10.7717/peerj-cs.3049","DOIUrl":"10.7717/peerj-cs.3049","url":null,"abstract":"<p><p>This research introduces the Municipal Digital Offering Index (MDOI) to assess municipal online service development in Chile. The study utilizes content analysis of municipal websites, creating a systematic instrument to evaluate digital services. It evaluates all 344 Chilean municipalities based on 163 dichotomous variables. Through factor analysis and regression modeling, it investigates sociodemographic and economic factors influencing digital development at the municipal level, offering insights into the digital divide across municipalities. The findings highlight geographical disparities and indicate priority intervention areas. While education levels and financial resources influence digital technology adoption, many municipalities lack efficient online procedures, prompting focused digital transformation investments. This research emphasizes the importance of localized digital services in bridging the digital divide and promoting inclusive governance.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3049"},"PeriodicalIF":2.5,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453795/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-02eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3177
Muhammed Abdulhamid Karabiyik, Bahaeddin Turkoglu, Tunc Asuroglu
Class imbalance remains a significant challenge in machine learning, leading to biased models that favor the majority class while failing to accurately classify minority instances. Traditional oversampling methods, such as Synthetic Minority Over-sampling Technique (SMOTE) and its variants, often struggle with class overlap, poor decision boundary representation, and noise accumulation. To address these limitations, this study introduces ClusterDEBO, a novel hybrid oversampling method that integrates K-Means clustering with differential evolution (DE) to generate synthetic samples in a more structured and adaptive manner. The proposed method first partitions the minority class into clusters using the silhouette score to determine the optimal number of clusters. Within each cluster, DE-based mutation and crossover operations are applied to generate diverse and well-distributed synthetic samples while preserving the underlying data distribution. Additionally, a selective sampling and noise reduction mechanism is employed to filter out low-impact synthetic samples based on their contribution to classification performance. The effectiveness of ClusterDEBO is evaluated on 44 benchmark datasets using k-Nearest Neighbors (kNN), decision tree (DT), and support vector machines (SVM) as classifiers. The results demonstrate that ClusterDEBO consistently outperforms existing oversampling techniques, leading to improved class separability and enhanced classifier robustness. Moreover, statistical validation using the Friedman test confirms the significance of the improvements, ensuring that the observed gains are not due to random variations. The findings highlight the potential of cluster-assisted differential evolution as a powerful strategy for handling imbalanced datasets.
{"title":"A cluster-assisted differential evolution-based hybrid oversampling method for imbalanced datasets.","authors":"Muhammed Abdulhamid Karabiyik, Bahaeddin Turkoglu, Tunc Asuroglu","doi":"10.7717/peerj-cs.3177","DOIUrl":"10.7717/peerj-cs.3177","url":null,"abstract":"<p><p>Class imbalance remains a significant challenge in machine learning, leading to biased models that favor the majority class while failing to accurately classify minority instances. Traditional oversampling methods, such as Synthetic Minority Over-sampling Technique (SMOTE) and its variants, often struggle with class overlap, poor decision boundary representation, and noise accumulation. To address these limitations, this study introduces ClusterDEBO, a novel hybrid oversampling method that integrates K-Means clustering with differential evolution (DE) to generate synthetic samples in a more structured and adaptive manner. The proposed method first partitions the minority class into clusters using the silhouette score to determine the optimal number of clusters. Within each cluster, DE-based mutation and crossover operations are applied to generate diverse and well-distributed synthetic samples while preserving the underlying data distribution. Additionally, a selective sampling and noise reduction mechanism is employed to filter out low-impact synthetic samples based on their contribution to classification performance. The effectiveness of ClusterDEBO is evaluated on 44 benchmark datasets using k-Nearest Neighbors (kNN), decision tree (DT), and support vector machines (SVM) as classifiers. The results demonstrate that ClusterDEBO consistently outperforms existing oversampling techniques, leading to improved class separability and enhanced classifier robustness. Moreover, statistical validation using the Friedman test confirms the significance of the improvements, ensuring that the observed gains are not due to random variations. The findings highlight the potential of cluster-assisted differential evolution as a powerful strategy for handling imbalanced datasets.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3177"},"PeriodicalIF":2.5,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453762/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The swift progression of technology has increased the complexity of cyber fraud, posing an escalating challenge for the banking sector to reliably and efficiently identify fraudulent credit card transactions. Conventional detection approaches fail to adapt to the advancing strategies of fraudsters, resulting in heightened false positives and inefficiency within fraud detection systems. This study overcomes these restrictions by creating an innovative stacking hybrid machine learning (ML) approach that combines decision trees (DT), random forests (RF), support vector machines (SVM), XGBoost, CatBoost, and logistic regression (LR) within a stacking ensemble framework. This method uses stacking to integrate diverse ML models, enhancing predictive performance, with a meta-model consolidating base model predictions, resulting in superior detection accuracy compared to any single model. The methodology utilizes sophisticated data preprocessing techniques, such as correlation-based feature selection and principal component analysis (PCA), to enhance computing efficiency while preserving essential information. Experimental assessments of a credit card transaction dataset reveal that the stacking ensemble model exhibits higher performance, achieving an F1-score of 88.14%, thereby efficiently balancing precision and recall. This outcome highlights the significance of ensemble methods such as stacking in attaining strong and dependable cyber fraud detection, emphasizing its capacity to markedly enhance the security of financial transactions.
{"title":"Enhancing credit card fraud detection with a stacking-based hybrid machine learning approach.","authors":"Eyad Abdel Latif Marazqah Btoush, Xujuan Zhou, Raj Gururajan, Ka Ching Chan, Omar Alsodi","doi":"10.7717/peerj-cs.3007","DOIUrl":"10.7717/peerj-cs.3007","url":null,"abstract":"<p><p>The swift progression of technology has increased the complexity of cyber fraud, posing an escalating challenge for the banking sector to reliably and efficiently identify fraudulent credit card transactions. Conventional detection approaches fail to adapt to the advancing strategies of fraudsters, resulting in heightened false positives and inefficiency within fraud detection systems. This study overcomes these restrictions by creating an innovative stacking hybrid machine learning (ML) approach that combines decision trees (DT), random forests (RF), support vector machines (SVM), XGBoost, CatBoost, and logistic regression (LR) within a stacking ensemble framework. This method uses stacking to integrate diverse ML models, enhancing predictive performance, with a meta-model consolidating base model predictions, resulting in superior detection accuracy compared to any single model. The methodology utilizes sophisticated data preprocessing techniques, such as correlation-based feature selection and principal component analysis (PCA), to enhance computing efficiency while preserving essential information. Experimental assessments of a credit card transaction dataset reveal that the stacking ensemble model exhibits higher performance, achieving an F1-score of 88.14%, thereby efficiently balancing precision and recall. This outcome highlights the significance of ensemble methods such as stacking in attaining strong and dependable cyber fraud detection, emphasizing its capacity to markedly enhance the security of financial transactions.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3007"},"PeriodicalIF":2.5,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453863/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-29eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3129
Maitha Alarjani, Abdulmajeed Almuaibed
Alzheimer's disease (AD) is a progressive neurological disorder that affects millions worldwide, leading to cognitive decline and memory impairment. Structural changes in the brain gradually impair cognitive functions, and by the time symptoms become evident, significant and often irreversible neuronal damage has already occurred. This makes early diagnosis critical, as timely intervention can help slow disease progression and improve patients' quality of life. Recent advancements in machine learning and neuroimaging have enabled early detection of AD using imaging data and computer-aided diagnostic systems. Deep learning, particularly with magnetic resonance imaging (MRI), has gained widespread recognition for its ability to extract high-level features by leveraging localized connections, weight sharing, and three-dimensional invariance. In this study, we present a 3d convolutional neural network (3D-CNN) designed to enhance classification accuracy using data from the latest version of the OASIS database (OASIS-3). Unlike traditional 2D approaches, our model processes full 3D MRI scans to preserve spatial information and prevent information loss during dimensionality reduction. Additionally, we applied advanced preprocessing techniques, including intensity normalization and noise reduction, to enhance image quality and improve classification performance. Our proposed 3D-CNN achieved an impressive classification accuracy of 91%, outperforming several existing models. These results highlight the potential of deep learning in developing more reliable and efficient diagnostic tools for early Alzheimer's detection, paving the way for improved clinical decision-making and patient outcomes.
{"title":"Optimizing a 3D convolutional neural network to detect Alzheimer's disease based on MRI.","authors":"Maitha Alarjani, Abdulmajeed Almuaibed","doi":"10.7717/peerj-cs.3129","DOIUrl":"10.7717/peerj-cs.3129","url":null,"abstract":"<p><p>Alzheimer's disease (AD) is a progressive neurological disorder that affects millions worldwide, leading to cognitive decline and memory impairment. Structural changes in the brain gradually impair cognitive functions, and by the time symptoms become evident, significant and often irreversible neuronal damage has already occurred. This makes early diagnosis critical, as timely intervention can help slow disease progression and improve patients' quality of life. Recent advancements in machine learning and neuroimaging have enabled early detection of AD using imaging data and computer-aided diagnostic systems. Deep learning, particularly with magnetic resonance imaging (MRI), has gained widespread recognition for its ability to extract high-level features by leveraging localized connections, weight sharing, and three-dimensional invariance. In this study, we present a 3d convolutional neural network (3D-CNN) designed to enhance classification accuracy using data from the latest version of the OASIS database (OASIS-3). Unlike traditional 2D approaches, our model processes full 3D MRI scans to preserve spatial information and prevent information loss during dimensionality reduction. Additionally, we applied advanced preprocessing techniques, including intensity normalization and noise reduction, to enhance image quality and improve classification performance. Our proposed 3D-CNN achieved an impressive classification accuracy of 91%, outperforming several existing models. These results highlight the potential of deep learning in developing more reliable and efficient diagnostic tools for early Alzheimer's detection, paving the way for improved clinical decision-making and patient outcomes.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3129"},"PeriodicalIF":2.5,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453851/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-29eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3152
Qinghua Su, Min Xie, Liyong Wang, Yue Song, Ao Cui, Zhihao Xie
Background: The research on autonomous driving using deep learning has made significant progress on structured roads, but there has been limited research on temporary roads. The End-to-End autonomous driving model is highly integrated, allowing for the direct translation of input data into desired driving actions. This method eliminates inter-module coupling, thereby enhancing the safety and stability of autonomous vehicles.
Methods: Therefore, we propose a novel End-to-End model for autonomous driving on temporary roads specifically designed for mobile robots. The model takes three road images as input, extracts image features using the Global Context Vision Transformer (GCViT) network, plans local paths through a Transformer network and a gated recurrent unit (GRU) network, and finally outputs the steering angle through a control model to manage the automatic tracking of unmanned ground vehicles. To verify the model performance, both simulation tests and field tests were conducted.
Results: The experimental results demonstrate that our End-to-End model accurately identifies temporary roads. The trajectory planning time for a single frame is approximately 100 ms, while the average trajectory deviation is 0.689 m. This performance meets the real-time processing requirements for low-speed vehicles, enabling unmanned vehicles to execute tracking tasks in temporary road environments.
{"title":"An End-to-End autonomous driving model based on visual perception for temporary roads.","authors":"Qinghua Su, Min Xie, Liyong Wang, Yue Song, Ao Cui, Zhihao Xie","doi":"10.7717/peerj-cs.3152","DOIUrl":"https://doi.org/10.7717/peerj-cs.3152","url":null,"abstract":"<p><strong>Background: </strong>The research on autonomous driving using deep learning has made significant progress on structured roads, but there has been limited research on temporary roads. The End-to-End autonomous driving model is highly integrated, allowing for the direct translation of input data into desired driving actions. This method eliminates inter-module coupling, thereby enhancing the safety and stability of autonomous vehicles.</p><p><strong>Methods: </strong>Therefore, we propose a novel End-to-End model for autonomous driving on temporary roads specifically designed for mobile robots. The model takes three road images as input, extracts image features using the Global Context Vision Transformer (GCViT) network, plans local paths through a Transformer network and a gated recurrent unit (GRU) network, and finally outputs the steering angle through a control model to manage the automatic tracking of unmanned ground vehicles. To verify the model performance, both simulation tests and field tests were conducted.</p><p><strong>Results: </strong>The experimental results demonstrate that our End-to-End model accurately identifies temporary roads. The trajectory planning time for a single frame is approximately 100 ms, while the average trajectory deviation is 0.689 m. This performance meets the real-time processing requirements for low-speed vehicles, enabling unmanned vehicles to execute tracking tasks in temporary road environments.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3152"},"PeriodicalIF":2.5,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453871/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-29eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3041
Oscar Peña-Cáceres, Antoni Mestre, Manoli Albert, Vicente Pelechano, Miriam Gil
In smart environments, autonomous systems often adapt their behavior to the context, and although such adaptations are generally beneficial, they may cause users to struggle to understand or trust them. To address this, we propose an explanation generation system that produces natural language descriptions (explanations) to clarify the adaptive behavior of smart home systems in runtime. These explanations are customized based on user characteristics and the contextual information derived from the user interactions with the system. Our approach leverages a prompt-based strategy using a fine-tuned large language model, guided by a modular template that integrates key data such as the type of explanation to be generated, user profile, runtime system information, interaction history, and the specific nature of the system adaptation. As a preliminary step, we also present a conceptual model that characterize explanations in the domain of autonomous systems by defining their core concepts. Finally, we evaluate the user experience of the generated explanations through an experiment involving 118 participants. Results show that generated explanations are perceived positive and with high level of acceptance.
{"title":"Automatic generation of explanations in autonomous systems: enhancing human interaction in smart home environments.","authors":"Oscar Peña-Cáceres, Antoni Mestre, Manoli Albert, Vicente Pelechano, Miriam Gil","doi":"10.7717/peerj-cs.3041","DOIUrl":"https://doi.org/10.7717/peerj-cs.3041","url":null,"abstract":"<p><p>In smart environments, autonomous systems often adapt their behavior to the context, and although such adaptations are generally beneficial, they may cause users to struggle to understand or trust them. To address this, we propose an explanation generation system that produces natural language descriptions (explanations) to clarify the adaptive behavior of smart home systems in runtime. These explanations are customized based on user characteristics and the contextual information derived from the user interactions with the system. Our approach leverages a prompt-based strategy using a fine-tuned large language model, guided by a modular template that integrates key data such as the type of explanation to be generated, user profile, runtime system information, interaction history, and the specific nature of the system adaptation. As a preliminary step, we also present a conceptual model that characterize explanations in the domain of autonomous systems by defining their core concepts. Finally, we evaluate the user experience of the generated explanations through an experiment involving 118 participants. Results show that generated explanations are perceived positive and with high level of acceptance.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3041"},"PeriodicalIF":2.5,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453768/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-29eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3095
Davide Chicco, Giuseppe Sabino, Luca Oneto, Giuseppe Jurman
Clustering methods are unsupervised machine learning techniques that aggregate data points into specific groups, called clusters, according to specific criteria defined by the clustering algorithm employed. Since clustering methods are unsupervised, no ground truth or gold standard information is available to assess its results, making it challenging to know the results obtained are good or not. In this context, several clustering internal rates are available, like Silhouette coefficient, Calinski-Harabasz index, Davies-Bouldin, Dunn index, Gap statistic, and Shannon entropy, just to mention a few. Even if popular, these clustering internal scores work well only when used to assess convex-shaped and well-separated clusters, but they fail when utilized to evaluate concave-shaped and nested clusters. In these concave-shaped and density-based cases, other coefficients can be informative: Density-Based Clustering Validation Index (DBCVI), Compose Density between and within clusters Index (CDbw), Density Cluster Separability Index (DCSI), Validity Index for Arbitrary-Shaped Clusters based on the kernel density estimation (VIASCKDE). In this study, we describe the DBCV index precisely, and compare its outcomes with the outcomes obtained by CDbw, DCSI, and VIASCKDE on several artificial datasets and on real-world medical datasets derived from electronic health records, produced by density-based clustering methods such as density-based spatial clustering of applications with noise (DBSCAN). To do so, we propose an innovative approach based on clustering result worsening or improving, rather than focusing on searching the "right" number of clusters like many studies do. Moreover, we also recommend open software packages in R and Python for its usage. Our results demonstrate the higher reliability of the DBCV index over CDbw, DCSI, and VIASCKDE when assessing concave-shaped, nested, clustering results.
{"title":"The DBCV index is more informative than DCSI, CDbw, and VIASCKDE indices for unsupervised clustering internal assessment of concave-shaped and density-based clusters.","authors":"Davide Chicco, Giuseppe Sabino, Luca Oneto, Giuseppe Jurman","doi":"10.7717/peerj-cs.3095","DOIUrl":"10.7717/peerj-cs.3095","url":null,"abstract":"<p><p>Clustering methods are unsupervised machine learning techniques that aggregate data points into specific groups, called <i>clusters</i>, according to specific criteria defined by the clustering algorithm employed. Since clustering methods are unsupervised, no ground truth or gold standard information is available to assess its results, making it challenging to know the results obtained are good or not. In this context, several clustering internal rates are available, like Silhouette coefficient, Calinski-Harabasz index, Davies-Bouldin, Dunn index, Gap statistic, and Shannon entropy, just to mention a few. Even if popular, these clustering internal scores work well only when used to assess convex-shaped and well-separated clusters, but they fail when utilized to evaluate concave-shaped and nested clusters. In these concave-shaped and density-based cases, other coefficients can be informative: Density-Based Clustering Validation Index (DBCVI), Compose Density between and within clusters Index (CDbw), Density Cluster Separability Index (DCSI), Validity Index for Arbitrary-Shaped Clusters based on the kernel density estimation (VIASCKDE). In this study, we describe the DBCV index precisely, and compare its outcomes with the outcomes obtained by CDbw, DCSI, and VIASCKDE on several artificial datasets and on real-world medical datasets derived from electronic health records, produced by density-based clustering methods such as density-based spatial clustering of applications with noise (DBSCAN). To do so, we propose an innovative approach based on clustering result worsening or improving, rather than focusing on searching the \"right\" number of clusters like many studies do. Moreover, we also recommend open software packages in R and Python for its usage. Our results demonstrate the higher reliability of the DBCV index over CDbw, DCSI, and VIASCKDE when assessing concave-shaped, nested, clustering results.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3095"},"PeriodicalIF":2.5,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453699/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-29eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3141
Aslı Bay
Operations on sensitive datasets from different parties are essential for various practical applications, such as verifying shopping lists or enforcing no-fly lists. Traditional methods often require one party to access both datasets, which poses privacy concerns. Private set operations provide a solution by enabling these functions without revealing the data involved. However, protocols involving three or more parties are generally much slower than unsecured methods. Outsourced private set operations, where computations are delegated to a non-colluding server, can significantly improve performance, though current protocols have not fully leveraged this assumption. We propose a new protocol that removes the need for public-key cryptography. Our non-interactive set intersection protocol relies solely on the security of an extendable output function, achieving high efficiency. Even in a ten-client setting with 16,384-element sets, the intersection can be computed in under 54 s without communication overhead. Our results indicate that substantial performance improvements can be made without sacrificing privacy, presenting a practical and efficient approach to private set operations.
{"title":"Delegated multi-party private set intersections from extendable output functions.","authors":"Aslı Bay","doi":"10.7717/peerj-cs.3141","DOIUrl":"10.7717/peerj-cs.3141","url":null,"abstract":"<p><p>Operations on sensitive datasets from different parties are essential for various practical applications, such as verifying shopping lists or enforcing no-fly lists. Traditional methods often require one party to access both datasets, which poses privacy concerns. Private set operations provide a solution by enabling these functions without revealing the data involved. However, protocols involving three or more parties are generally much slower than unsecured methods. Outsourced private set operations, where computations are delegated to a non-colluding server, can significantly improve performance, though current protocols have not fully leveraged this assumption. We propose a new protocol that removes the need for public-key cryptography. Our non-interactive set intersection protocol relies solely on the security of an extendable output function, achieving high efficiency. Even in a ten-client setting with 16,384-element sets, the intersection can be computed in under 54 s without communication overhead. Our results indicate that substantial performance improvements can be made without sacrificing privacy, presenting a practical and efficient approach to private set operations.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3141"},"PeriodicalIF":2.5,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453744/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-29eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3052
Feras Al-Obeidat, Adnan Amin, Ahmed Shuhaiber, Inam Ul Haq
The industrial Internet of Things (IIoT) and digital twins are redefining how digital models and physical systems interact. IIoT connects physical intelligence, and digital twins virtually represent their physical counterparts. With the rapid growth of Edge-IIoT, it is crucial to create security and privacy regulations to prevent vulnerabilities and threats (i.e., distributed denial of service (DDoS)). DDoS attacks use botnets to overload the target system with requests. In this study, we introduce a novel approach for detecting DDoS attacks in an Edge-IIoT digital twin-based generated dataset. The proposed approach is designed to retain already learned knowledge and easily adapt to new models in a continuous manner without retraining the deep learning model. The target dataset is publicly available and contains 157,600 samples. The proposed models M1, M2, and M3 obtained precision scores of 0.94, 0.93, and 0.93; recall scores of 0.91, 0.97, and 0.99; F1-scores of 0.93, 0.95, and 0.96; and accuracy scores of 0.93, 0.95, and 0.96, respectively. The results demonstrated that transferring previous model knowledge to the next model consistently outperformed baseline approaches.
{"title":"DDoS attack detection in Edge-IIoT digital twin environment using deep learning approach.","authors":"Feras Al-Obeidat, Adnan Amin, Ahmed Shuhaiber, Inam Ul Haq","doi":"10.7717/peerj-cs.3052","DOIUrl":"10.7717/peerj-cs.3052","url":null,"abstract":"<p><p>The industrial Internet of Things (IIoT) and digital twins are redefining how digital models and physical systems interact. IIoT connects physical intelligence, and digital twins virtually represent their physical counterparts. With the rapid growth of Edge-IIoT, it is crucial to create security and privacy regulations to prevent vulnerabilities and threats (<i>i.e</i>., distributed denial of service (DDoS)). DDoS attacks use botnets to overload the target system with requests. In this study, we introduce a novel approach for detecting DDoS attacks in an Edge-IIoT digital twin-based generated dataset. The proposed approach is designed to retain already learned knowledge and easily adapt to new models in a continuous manner without retraining the deep learning model. The target dataset is publicly available and contains 157,600 samples. The proposed models M1, M2, and M3 obtained precision scores of 0.94, 0.93, and 0.93; recall scores of 0.91, 0.97, and 0.99; F1-scores of 0.93, 0.95, and 0.96; and accuracy scores of 0.93, 0.95, and 0.96, respectively. The results demonstrated that transferring previous model knowledge to the next model consistently outperformed baseline approaches.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3052"},"PeriodicalIF":2.5,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453828/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}