Chai Wen Chuah, Nur Ziadah Harun, Isredza Rahmi A. Hamid
The key derivation function is a specific cryptographic algorithm that transforms private string and public strings into one or more cryptographic keys. The cryptographic keys are essential for protecting electronic data during transmission on the internet. This function is designed based on a computational extractor and pseudorandom expander and is typically constructed using various cryptography ciphers such as stream ciphers, keyed-hash message authentication codes, and block ciphers. Having secure and efficient key derivation function designs is essential in the development of numerous security systems. A vulnerable key derivation function could potentially give attackers the ability to compromise an otherwise secure cryptosystem. This research proposes a different approach by combining two different cryptography ciphers to develop key derivation functions. The findings demonstrate that a computational extractor utilizing keyed-hash message authentication codes and a pseudorandom expander using stream ciphers maintain the highest level of security while also providing efficiency benefits in terms of execution time compared to existing key derivation function schemes.
{"title":"Key derivation function: key-hash based computational extractor and stream based pseudorandom expander","authors":"Chai Wen Chuah, Nur Ziadah Harun, Isredza Rahmi A. Hamid","doi":"10.7717/peerj-cs.2249","DOIUrl":"https://doi.org/10.7717/peerj-cs.2249","url":null,"abstract":"The key derivation function is a specific cryptographic algorithm that transforms private string and public strings into one or more cryptographic keys. The cryptographic keys are essential for protecting electronic data during transmission on the internet. This function is designed based on a computational extractor and pseudorandom expander and is typically constructed using various cryptography ciphers such as stream ciphers, keyed-hash message authentication codes, and block ciphers. Having secure and efficient key derivation function designs is essential in the development of numerous security systems. A vulnerable key derivation function could potentially give attackers the ability to compromise an otherwise secure cryptosystem. This research proposes a different approach by combining two different cryptography ciphers to develop key derivation functions. The findings demonstrate that a computational extractor utilizing keyed-hash message authentication codes and a pseudorandom expander using stream ciphers maintain the highest level of security while also providing efficiency benefits in terms of execution time compared to existing key derivation function schemes.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"1 1","pages":""},"PeriodicalIF":3.8,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background This study was motivated by the increasing popularity of Massive Open Online Courses (MOOCs) and the challenges they face, such as high dropout and failure rates. The existing knowledge primarily focused on predicting student dropout, but this study aimed to go beyond that by predicting both student dropout and course results. By using machine learning models and analyzing various data sources, the study sought to improve our understanding of factors influencing student success in MOOCs. Objectives The primary aim of this research was to develop accurate predictions of students’ course outcomes in MOOCs, specifically whether they would pass or fail. Unlike previous studies, this study took into account demographic, assessment, and student interaction data to provide comprehensive predictions. Methods The study utilized demographic, assessment, and student interaction data to develop predictive models. Two machine learning methods, logistic regression, and random forest classification were employed to predict students’ course outcomes. The accuracy of the models was evaluated based on four-class classification (predicting four possible outcomes) and two-class classification (predicting pass or fail). Results and Conclusions The study found that simple indicators, such as a student’s activity level on a given day, could be as effective as more complex data combinations or personal information in predicting student success. The logistic regression model achieved an accuracy of 72.1% for four-class classification and 92.4% for 2-class classification, while the random forest classifier achieved an accuracy of 74.6% for four-class classification and 95.7% for two-class classification. These findings highlight the potential of machine learning models in predicting and understanding students’ course outcomes in MOOCs, offering valuable insights for improving student engagement and success in online learning environments.
{"title":"Predicting student success in MOOCs: a comprehensive analysis using machine learning models","authors":"Hosam A. Althibyani","doi":"10.7717/peerj-cs.2221","DOIUrl":"https://doi.org/10.7717/peerj-cs.2221","url":null,"abstract":"Background This study was motivated by the increasing popularity of Massive Open Online Courses (MOOCs) and the challenges they face, such as high dropout and failure rates. The existing knowledge primarily focused on predicting student dropout, but this study aimed to go beyond that by predicting both student dropout and course results. By using machine learning models and analyzing various data sources, the study sought to improve our understanding of factors influencing student success in MOOCs. Objectives The primary aim of this research was to develop accurate predictions of students’ course outcomes in MOOCs, specifically whether they would pass or fail. Unlike previous studies, this study took into account demographic, assessment, and student interaction data to provide comprehensive predictions. Methods The study utilized demographic, assessment, and student interaction data to develop predictive models. Two machine learning methods, logistic regression, and random forest classification were employed to predict students’ course outcomes. The accuracy of the models was evaluated based on four-class classification (predicting four possible outcomes) and two-class classification (predicting pass or fail). Results and Conclusions The study found that simple indicators, such as a student’s activity level on a given day, could be as effective as more complex data combinations or personal information in predicting student success. The logistic regression model achieved an accuracy of 72.1% for four-class classification and 92.4% for 2-class classification, while the random forest classifier achieved an accuracy of 74.6% for four-class classification and 95.7% for two-class classification. These findings highlight the potential of machine learning models in predicting and understanding students’ course outcomes in MOOCs, offering valuable insights for improving student engagement and success in online learning environments.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"21 1","pages":""},"PeriodicalIF":3.8,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sheharyar Khan, Zheng Jiangbin, Farhan Ullah, Muhammad Pervez Akhter, Sohrab Khan, Fuad A. Awwad, Emad A.A. Ismail
In the distributed computing era, cloud computing has completely changed organizational operations by facilitating simple access to resources. However, the rapid development of the IoT has led to collaborative computing, which raises scalability and security challenges. To fully realize the potential of the Internet of Things (IoT) in smart home technologies, there is still a need for strong data security solutions, which are essential in dynamic offloading in conjunction with edge, fog, and cloud computing. This research on smart home challenges covers in-depth examinations of data security, privacy, processing speed, storage capacity restrictions, and analytics inside networked IoT devices. We introduce the Trusted IoT Big Data Analytics (TIBDA) framework as a comprehensive solution to reshape smart living. Our primary focus is mitigating pervasive data security and privacy issues. TIBDA incorporates robust trust mechanisms, prioritizing data privacy and reliability for secure processing and user information confidentiality within the smart home environment. We achieve this by employing a hybrid cryptosystem that combines Elliptic Curve Cryptography (ECC), Post Quantum Cryptography (PQC), and Blockchain technology (BCT) to protect user privacy and confidentiality. Additionally, we comprehensively compared four prominent Artificial Intelligence anomaly detection algorithms (Isolation Forest, Local Outlier Factor, One-Class SVM, and Elliptic Envelope). We utilized machine learning classification algorithms (random forest, k-nearest neighbors, support vector machines, linear discriminant analysis, and quadratic discriminant analysis) for detecting malicious and non-malicious activities in smart home systems. Furthermore, the main part of the research is with the help of an artificial neural network (ANN) dynamic algorithm; the TIBDA framework designs a hybrid computing system that integrates edge, fog, and cloud architecture and efficiently supports numerous users while processing data from IoT devices in real-time. The analysis shows that TIBDA outperforms these systems significantly across various metrics. In terms of response time, TIBDA demonstrated a reduction of 10–20% compared to the other systems under varying user loads, device counts, and transaction volumes. Regarding security, TIBDA’s AUC values were consistently higher by 5–15%, indicating superior protection against threats. Additionally, TIBDA exhibited the highest trustworthiness with an uptime percentage 10–12% greater than its competitors. TIBDA’s Isolation Forest algorithm achieved an accuracy of 99.30%, and the random forest algorithm achieved an accuracy of 94.70%, outperforming other methods by 8–11%. Furthermore, our ANN-based offloading decision-making model achieved a validation accuracy of 99% and reduced loss to 0.11, demonstrating significant improvements in resource utilization and system performance.
{"title":"Hybrid computing framework security in dynamic offloading for IoT-enabled smart home system","authors":"Sheharyar Khan, Zheng Jiangbin, Farhan Ullah, Muhammad Pervez Akhter, Sohrab Khan, Fuad A. Awwad, Emad A.A. Ismail","doi":"10.7717/peerj-cs.2211","DOIUrl":"https://doi.org/10.7717/peerj-cs.2211","url":null,"abstract":"In the distributed computing era, cloud computing has completely changed organizational operations by facilitating simple access to resources. However, the rapid development of the IoT has led to collaborative computing, which raises scalability and security challenges. To fully realize the potential of the Internet of Things (IoT) in smart home technologies, there is still a need for strong data security solutions, which are essential in dynamic offloading in conjunction with edge, fog, and cloud computing. This research on smart home challenges covers in-depth examinations of data security, privacy, processing speed, storage capacity restrictions, and analytics inside networked IoT devices. We introduce the Trusted IoT Big Data Analytics (TIBDA) framework as a comprehensive solution to reshape smart living. Our primary focus is mitigating pervasive data security and privacy issues. TIBDA incorporates robust trust mechanisms, prioritizing data privacy and reliability for secure processing and user information confidentiality within the smart home environment. We achieve this by employing a hybrid cryptosystem that combines Elliptic Curve Cryptography (ECC), Post Quantum Cryptography (PQC), and Blockchain technology (BCT) to protect user privacy and confidentiality. Additionally, we comprehensively compared four prominent Artificial Intelligence anomaly detection algorithms (Isolation Forest, Local Outlier Factor, One-Class SVM, and Elliptic Envelope). We utilized machine learning classification algorithms (random forest, k-nearest neighbors, support vector machines, linear discriminant analysis, and quadratic discriminant analysis) for detecting malicious and non-malicious activities in smart home systems. Furthermore, the main part of the research is with the help of an artificial neural network (ANN) dynamic algorithm; the TIBDA framework designs a hybrid computing system that integrates edge, fog, and cloud architecture and efficiently supports numerous users while processing data from IoT devices in real-time. The analysis shows that TIBDA outperforms these systems significantly across various metrics. In terms of response time, TIBDA demonstrated a reduction of 10–20% compared to the other systems under varying user loads, device counts, and transaction volumes. Regarding security, TIBDA’s AUC values were consistently higher by 5–15%, indicating superior protection against threats. Additionally, TIBDA exhibited the highest trustworthiness with an uptime percentage 10–12% greater than its competitors. TIBDA’s Isolation Forest algorithm achieved an accuracy of 99.30%, and the random forest algorithm achieved an accuracy of 94.70%, outperforming other methods by 8–11%. Furthermore, our ANN-based offloading decision-making model achieved a validation accuracy of 99% and reduced loss to 0.11, demonstrating significant improvements in resource utilization and system performance.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"2 1","pages":""},"PeriodicalIF":3.8,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A pre-touch reaction, which is a response before a physical contact, is an essential factor for natural human-agent interaction. Although numerous studies have investigated the effectiveness of pre-touch reaction design for virtual agents in virtual reality (VR) environments and robots in physical environments, one area remains underexplored: displayed agents, i.e., on-screen computer graphics agents. To design an appropriate pre-touch reaction for such a displayed agent, this article focused on the display’s physical boundary as a criterion for the pre-touch reaction of the agent. This article developed a displayed agent system that can detect both the touch events on the screen and the pre-touch behaviors of the interacting people around the display. This study examined the effectiveness of the pre-touch reactions of the displayed agent by the developed system in experiments with human participants. The findings revealed that people significantly preferred pre-touch reactions over post-touch reactions in the context of perceived feelings.
{"title":"Pre-touch reaction is preferred over post-touch reaction in interaction with displayed agent","authors":"Masahiro Shiomi","doi":"10.7717/peerj-cs.2277","DOIUrl":"https://doi.org/10.7717/peerj-cs.2277","url":null,"abstract":"A pre-touch reaction, which is a response before a physical contact, is an essential factor for natural human-agent interaction. Although numerous studies have investigated the effectiveness of pre-touch reaction design for virtual agents in virtual reality (VR) environments and robots in physical environments, one area remains underexplored: displayed agents, i.e., on-screen computer graphics agents. To design an appropriate pre-touch reaction for such a displayed agent, this article focused on the display’s physical boundary as a criterion for the pre-touch reaction of the agent. This article developed a displayed agent system that can detect both the touch events on the screen and the pre-touch behaviors of the interacting people around the display. This study examined the effectiveness of the pre-touch reactions of the displayed agent by the developed system in experiments with human participants. The findings revealed that people significantly preferred pre-touch reactions over post-touch reactions in the context of perceived feelings.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"13 1","pages":""},"PeriodicalIF":3.8,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this study, we examine different approaches to the presentation of Y coordinates in mobile auditory graphs, including the representation of negative numbers. These studies involved both normally sighted and visually impaired users, as there are applications where normally sighted users might employ auditory graphs, such as the unseen monitoring of stocks, or fuel consumption in a car. Multi-reference sonification schemes are investigated as a means of improving the performance of mobile non-visual point estimation tasks. The results demonstrated that both populations are able to carry out point estimation tasks with a good level of performance when presented with auditory graphs using multiple reference tones. Additionally, visually impaired participants performed better on graphs represented in this format than normally sighted participants. This work also implements the component representation approach for negative numbers to represent the mapping by using the same positive mapping reference for the digit and adding a sign before the digit which leads to a better accuracy of the polarity sign. This work contributes to the areas of the design process of mobile auditory devices in human-computer interaction and proposed a methodological framework related to improving auditory graph performance in graph reproduction.
在本研究中,我们研究了在移动听觉图形中呈现 Y 坐标的不同方法,包括负数的表示。这些研究既涉及视力正常的用户,也涉及视力受损的用户,因为在一些应用中,视力正常的用户可能会使用听觉图形,例如看不见的股票监控或汽车油耗。研究将多参考声化方案作为提高移动非视觉点估算任务性能的一种手段。研究结果表明,在使用多参考音调的听觉图形呈现时,两种人群都能以良好的成绩完成点估算任务。此外,与视力正常的参与者相比,视障参与者在这种形式的图形上表现得更好。这项工作还实现了负数的分量表示法,通过对数字使用相同的正数映射参考来表示映射,并在数字前添加一个符号,从而提高了极性符号的准确性。这项工作为人机交互中移动听觉设备的设计过程领域做出了贡献,并提出了一个与提高图形再现中的听觉图形性能有关的方法框架。
{"title":"Representation of negative numbers: point estimation tasks using multi-reference sonification mappings","authors":"Zico Pratama Putra, Deni Setiawan","doi":"10.7717/peerj-cs.2275","DOIUrl":"https://doi.org/10.7717/peerj-cs.2275","url":null,"abstract":"In this study, we examine different approaches to the presentation of Y coordinates in mobile auditory graphs, including the representation of negative numbers. These studies involved both normally sighted and visually impaired users, as there are applications where normally sighted users might employ auditory graphs, such as the unseen monitoring of stocks, or fuel consumption in a car. Multi-reference sonification schemes are investigated as a means of improving the performance of mobile non-visual point estimation tasks. The results demonstrated that both populations are able to carry out point estimation tasks with a good level of performance when presented with auditory graphs using multiple reference tones. Additionally, visually impaired participants performed better on graphs represented in this format than normally sighted participants. This work also implements the component representation approach for negative numbers to represent the mapping by using the same positive mapping reference for the digit and adding a sign before the digit which leads to a better accuracy of the polarity sign. This work contributes to the areas of the design process of mobile auditory devices in human-computer interaction and proposed a methodological framework related to improving auditory graph performance in graph reproduction.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"3 1","pages":""},"PeriodicalIF":3.8,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this study, a novel algorithm named the Edge-enhanced Generative Adversarial Network (EGAN) is proposed to address the issues of noise corruption and edge fuzziness in the super-resolution of remote sensing images. To build upon the baseline model called Deep Blind Super-Resolution GAN (DBSR-GAN), an edge enhancement module is introduced to enhance the edge information of the images. To enlarge the receptive field of the algorithm, the Mask branch within the edge enhancement structure is further optimized. Moreover, the loss of image consistency is introduced to guide edge reconstruction, and subpixel convolution is employed for upsampling, thus resulting in sharper edge contours and more consistent stylized results. To tackle the low utilization of global information and the reconstruction of super-resolution artifacts in remote sensing images, an alternative algorithm named Nonlocal Module and Artifact Discrimination EGAN (END-GAN) is proposed. The END-GAN introduces a nonlocal module based on the EGAN in the feature extraction stage of the algorithm, enabling better utilization of the internal correlations of remote sensing images and enhancing the algorithm’s capability to extract global target features. Additionally, a method discriminating artifacts is implemented to distinguish between artifacts and reals in reconstructed images. Then, the algorithm is optimized by introducing an artifact loss discrimination alongside the original loss function. Experimental comparisons on two datasets of remote sensing images, NWPUVHR-10 and UCAS-AOD, demonstrate significant improvements in the evaluation indexes when the proposed algorithm is under investigation.
{"title":"Reconstruction of super-resolution from high-resolution remote sensing images based on convolutional neural networks","authors":"Yang Liu, Hu Xu, Xiaodong Shi","doi":"10.7717/peerj-cs.2218","DOIUrl":"https://doi.org/10.7717/peerj-cs.2218","url":null,"abstract":"In this study, a novel algorithm named the Edge-enhanced Generative Adversarial Network (EGAN) is proposed to address the issues of noise corruption and edge fuzziness in the super-resolution of remote sensing images. To build upon the baseline model called Deep Blind Super-Resolution GAN (DBSR-GAN), an edge enhancement module is introduced to enhance the edge information of the images. To enlarge the receptive field of the algorithm, the Mask branch within the edge enhancement structure is further optimized. Moreover, the loss of image consistency is introduced to guide edge reconstruction, and subpixel convolution is employed for upsampling, thus resulting in sharper edge contours and more consistent stylized results. To tackle the low utilization of global information and the reconstruction of super-resolution artifacts in remote sensing images, an alternative algorithm named Nonlocal Module and Artifact Discrimination EGAN (END-GAN) is proposed. The END-GAN introduces a nonlocal module based on the EGAN in the feature extraction stage of the algorithm, enabling better utilization of the internal correlations of remote sensing images and enhancing the algorithm’s capability to extract global target features. Additionally, a method discriminating artifacts is implemented to distinguish between artifacts and reals in reconstructed images. Then, the algorithm is optimized by introducing an artifact loss discrimination alongside the original loss function. Experimental comparisons on two datasets of remote sensing images, NWPUVHR-10 and UCAS-AOD, demonstrate significant improvements in the evaluation indexes when the proposed algorithm is under investigation.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"6 1","pages":""},"PeriodicalIF":3.8,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hong Zhao, Jinhai Huang, Wengai Li, Zhaobin Chang, Weijie Wang
The generator, which combines convolutional neural network (CNN) and Transformer as its core modules, serves as the primary model for the handwriting font generation network and demonstrates effective performance. However, there are still problems with insufficient feature extraction in the overall structure of the font, the thickness of strokes, and the curvature of strokes, resulting in subpar detail in the generated fonts. To solve the problems, we propose a method for constructing a handwritten font generation model based on Pyramid Squeeze Attention, called PSA-HWT. The PSA-HWT model is divided into two parts: an encoder and a decoder. In the encoder, a multi-branch structure is used to extract spatial information at different scales from the input feature map, achieving multi-scale feature extraction. This helps better capture the semantic information and global structure of the font, aiding the generation model in understanding fine-grained features such as the shape, thickness, and curvature of the font. In the decoder, it uses a self-attention mechanism to capture dependencies across various positions in the input sequence. This helps to better understand the relationship between the generated strokes or characters and the handwritten font being generated, ensuring the overall coherence of the generated handwritten text. The experimental results on the IAM dataset demonstrate that PSA-HWT achieves a 16.35% decrease in Fréchet inception distance (FID) score and a 13.09% decrease in Geometry Score (GS) compared to the current advanced methods. This indicates that PSA-HWT generates handwritten fonts of higher quality, making it more practically valuable.
{"title":"PSA-HWT: handwritten font generation based on pyramid squeeze attention","authors":"Hong Zhao, Jinhai Huang, Wengai Li, Zhaobin Chang, Weijie Wang","doi":"10.7717/peerj-cs.2261","DOIUrl":"https://doi.org/10.7717/peerj-cs.2261","url":null,"abstract":"The generator, which combines convolutional neural network (CNN) and Transformer as its core modules, serves as the primary model for the handwriting font generation network and demonstrates effective performance. However, there are still problems with insufficient feature extraction in the overall structure of the font, the thickness of strokes, and the curvature of strokes, resulting in subpar detail in the generated fonts. To solve the problems, we propose a method for constructing a handwritten font generation model based on Pyramid Squeeze Attention, called PSA-HWT. The PSA-HWT model is divided into two parts: an encoder and a decoder. In the encoder, a multi-branch structure is used to extract spatial information at different scales from the input feature map, achieving multi-scale feature extraction. This helps better capture the semantic information and global structure of the font, aiding the generation model in understanding fine-grained features such as the shape, thickness, and curvature of the font. In the decoder, it uses a self-attention mechanism to capture dependencies across various positions in the input sequence. This helps to better understand the relationship between the generated strokes or characters and the handwritten font being generated, ensuring the overall coherence of the generated handwritten text. The experimental results on the IAM dataset demonstrate that PSA-HWT achieves a 16.35% decrease in Fréchet inception distance (FID) score and a 13.09% decrease in Geometry Score (GS) compared to the current advanced methods. This indicates that PSA-HWT generates handwritten fonts of higher quality, making it more practically valuable.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"60 1","pages":""},"PeriodicalIF":3.8,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Surface defect inspection methods have proven effective in addressing casting quality control tasks. However, traditional inspection methods often struggle to achieve high-precision detection of surface defects in castings with similar characteristics and minor scales. The study introduces DES-YOLO, a novel real-time method for detecting castings’ surface defects. In the DES-YOLO model, we incorporate the DSC-Darknet backbone network and global attention mechanism (GAM) module to enhance the identification of defect target features. These additions are essential for overcoming the challenge posed by the high similarity among defect characteristics, such as shrinkage holes and slag holes, which can result in decreased detection accuracy. An enhanced pyramid pooling module is also introduced to improve feature representation for small defective parts through multi-layer pooling. We integrate Slim-Neck and SIoU bounding box regression loss functions for real-time detection in actual production scenarios. These functions reduce memory overhead and enable real-time detection of surface defects in castings. Experimental findings demonstrate that the DES-YOLO model achieves a mean average precision (mAP) of 92.6% on the CSD-DET dataset and a single-image inference speed of 3.9 milliseconds. The proposed method proves capable of swiftly and accurately accomplishing real-time detection of surface defects in castings.
{"title":"DES-YOLO: a novel model for real-time detection of casting surface defects","authors":"Chengjun Wang, Jiaqi Hu, Chaoyu Yang, Peng Hu","doi":"10.7717/peerj-cs.2224","DOIUrl":"https://doi.org/10.7717/peerj-cs.2224","url":null,"abstract":"Surface defect inspection methods have proven effective in addressing casting quality control tasks. However, traditional inspection methods often struggle to achieve high-precision detection of surface defects in castings with similar characteristics and minor scales. The study introduces DES-YOLO, a novel real-time method for detecting castings’ surface defects. In the DES-YOLO model, we incorporate the DSC-Darknet backbone network and global attention mechanism (GAM) module to enhance the identification of defect target features. These additions are essential for overcoming the challenge posed by the high similarity among defect characteristics, such as shrinkage holes and slag holes, which can result in decreased detection accuracy. An enhanced pyramid pooling module is also introduced to improve feature representation for small defective parts through multi-layer pooling. We integrate Slim-Neck and SIoU bounding box regression loss functions for real-time detection in actual production scenarios. These functions reduce memory overhead and enable real-time detection of surface defects in castings. Experimental findings demonstrate that the DES-YOLO model achieves a mean average precision (mAP) of 92.6% on the CSD-DET dataset and a single-image inference speed of 3.9 milliseconds. The proposed method proves capable of swiftly and accurately accomplishing real-time detection of surface defects in castings.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"60 1","pages":""},"PeriodicalIF":3.8,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To address the challenge of suboptimal object detection outcomes stemming from the deformability of marine flexible biological entities, this study introduces an algorithm tailored for detecting marine flexible biological targets. Initially, we compiled a dataset comprising marine flexible biological subjects and developed a Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithm, supplemented with a boundary detection enhancement module, to refine underwater image quality and accentuate the distinction between the images’ foregrounds and backgrounds. This enhancement mitigates the issue of foreground-background similarity encountered in detecting marine flexible biological entities. Moreover, the proposed adaptation incorporates a Deformable Convolutional Network (DCN) network module in lieu of the C2f module within the YOLOv8n algorithm framework, thereby augmenting the model’s proficiency in capturing geometric transformations and concentrating on pivotal areas. The Neck network module is enhanced with the RepBi-PAN architecture, bolstering its capability to amalgamate and emphasize essential characteristics of flexible biological targets. To advance the model’s feature information processing efficiency, we integrated the SimAM attention mechanism. Finally, to diminish the adverse effects of inferior-quality labels within the dataset, we advocate the use of WIoU (Wise-IoU) as a bounding box loss function, which serves to refine the anchor boxes’ quality assessment. Simulation experiments show that, in comparison to the conventional YOLOv8n algorithm, our method markedly elevates the precision of marine flexible biological target detection.
{"title":"Research on marine flexible biological target detection based on improved YOLOv8 algorithm","authors":"Yu Tian, Yanwen Liu, Baohang Lin, Peng Li","doi":"10.7717/peerj-cs.2271","DOIUrl":"https://doi.org/10.7717/peerj-cs.2271","url":null,"abstract":"To address the challenge of suboptimal object detection outcomes stemming from the deformability of marine flexible biological entities, this study introduces an algorithm tailored for detecting marine flexible biological targets. Initially, we compiled a dataset comprising marine flexible biological subjects and developed a Contrast Limited Adaptive Histogram Equalization (CLAHE) algorithm, supplemented with a boundary detection enhancement module, to refine underwater image quality and accentuate the distinction between the images’ foregrounds and backgrounds. This enhancement mitigates the issue of foreground-background similarity encountered in detecting marine flexible biological entities. Moreover, the proposed adaptation incorporates a Deformable Convolutional Network (DCN) network module in lieu of the C2f module within the YOLOv8n algorithm framework, thereby augmenting the model’s proficiency in capturing geometric transformations and concentrating on pivotal areas. The Neck network module is enhanced with the RepBi-PAN architecture, bolstering its capability to amalgamate and emphasize essential characteristics of flexible biological targets. To advance the model’s feature information processing efficiency, we integrated the SimAM attention mechanism. Finally, to diminish the adverse effects of inferior-quality labels within the dataset, we advocate the use of WIoU (Wise-IoU) as a bounding box loss function, which serves to refine the anchor boxes’ quality assessment. Simulation experiments show that, in comparison to the conventional YOLOv8n algorithm, our method markedly elevates the precision of marine flexible biological target detection.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"24 1","pages":""},"PeriodicalIF":3.8,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Collision avoidance is a crucial component of any decentralized multi-agent navigation system. Currently, most of the existing multi-agent collision-avoidance methods either do not take into account the kinematic constraints of the agents (i.e., they assume that an agent might change the direction of movement instantaneously) or are tailored to specific kinematic motion models (e.g., car-like robots). In this work, we suggest a novel generalized approach to decentralized multi-agent collision-avoidance that can be applied to agents with arbitrary affine kinematic motion models, including but not limited to differential-drive robots, car-like robots, quadrotors, etc. The suggested approach is based on the seminal sampling-based model predictive control algorithm, i.e., MPPI, that originally solves a single-agent problem. We enhance it by introducing safe distributions for the multi-agent setting that are derived from the Optimal Reciprocal Collision Avoidance (ORCA) linear constraints, an established approach from the multi-agent navigation domain. We rigorously show that such distributions can be found by solving a specific convex optimization problem. We also provide a theoretical justification that the resultant algorithm guarantees safety, i.e., that at each time step the control suggested by our algorithm does not lead to a collision. We empirically evaluate the proposed method in simulation experiments that involve comparison with the state of the art in different setups. We find that in many cases, the suggested approach outperforms competitors and allows solving problem instances that the other methods cannot successfully solve.
{"title":"Model predictive path integral for decentralized multi-agent collision avoidance","authors":"Stepan Dergachev, Konstantin Yakovlev","doi":"10.7717/peerj-cs.2220","DOIUrl":"https://doi.org/10.7717/peerj-cs.2220","url":null,"abstract":"Collision avoidance is a crucial component of any decentralized multi-agent navigation system. Currently, most of the existing multi-agent collision-avoidance methods either do not take into account the kinematic constraints of the agents (i.e., they assume that an agent might change the direction of movement instantaneously) or are tailored to specific kinematic motion models (e.g., car-like robots). In this work, we suggest a novel generalized approach to decentralized multi-agent collision-avoidance that can be applied to agents with arbitrary affine kinematic motion models, including but not limited to differential-drive robots, car-like robots, quadrotors, etc. The suggested approach is based on the seminal sampling-based model predictive control algorithm, i.e., MPPI, that originally solves a single-agent problem. We enhance it by introducing safe distributions for the multi-agent setting that are derived from the Optimal Reciprocal Collision Avoidance (ORCA) linear constraints, an established approach from the multi-agent navigation domain. We rigorously show that such distributions can be found by solving a specific convex optimization problem. We also provide a theoretical justification that the resultant algorithm guarantees safety, i.e., that at each time step the control suggested by our algorithm does not lead to a collision. We empirically evaluate the proposed method in simulation experiments that involve comparison with the state of the art in different setups. We find that in many cases, the suggested approach outperforms competitors and allows solving problem instances that the other methods cannot successfully solve.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"96 1","pages":""},"PeriodicalIF":3.8,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}