In the context of the digital transformation of metrology, ensuring the trustworthiness and integrity of measurement data during its generation, transmission, and storage-i.e., trustworthy detection of measurement data-has become a critical challenge. Data traces are residual marks left during the data processing, which help identify malicious activities targeting measurement data. These traces are especially important when the trust and integrity of potential data evidence are under threat. To this end, this article systematically reviews relevant core techniques and analyzes various detection methods across the different stages of the data lifecycle, evaluating their applicability and limitations in identifying data tampering, unauthorized access, and anomalous operations. The findings suggest that trace detection technologies can enhance the traceability and transparency of metrological data, thereby providing technical support for building a trustworthy digital metrology system. This review lays the theoretical foundation for future research on developing automated anomaly detection models, improving forensic techniques for data tampering in measurement devices, and constructing multi-modal, full-lifecycle traceability frameworks for measurement data. Subsequent studies should focus on aligning these technologies with metrological standards and verifying their deployment in real-world measurement instruments.
{"title":"Data trace as the scientific foundation for trusted metrological data: a review for future metrology direction.","authors":"Zhanshuo Cao, Boyong Gao, Zilong Liu, Xingchuang Xiong, Bin Wang, Chenbo Pei","doi":"10.7717/peerj-cs.3106","DOIUrl":"10.7717/peerj-cs.3106","url":null,"abstract":"<p><p>In the context of the digital transformation of metrology, ensuring the trustworthiness and integrity of measurement data during its generation, transmission, and storage-<i>i.e</i>., trustworthy detection of measurement data-has become a critical challenge. Data traces are residual marks left during the data processing, which help identify malicious activities targeting measurement data. These traces are especially important when the trust and integrity of potential data evidence are under threat. To this end, this article systematically reviews relevant core techniques and analyzes various detection methods across the different stages of the data lifecycle, evaluating their applicability and limitations in identifying data tampering, unauthorized access, and anomalous operations. The findings suggest that trace detection technologies can enhance the traceability and transparency of metrological data, thereby providing technical support for building a trustworthy digital metrology system. This review lays the theoretical foundation for future research on developing automated anomaly detection models, improving forensic techniques for data tampering in measurement devices, and constructing multi-modal, full-lifecycle traceability frameworks for measurement data. Subsequent studies should focus on aligning these technologies with metrological standards and verifying their deployment in real-world measurement instruments.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3106"},"PeriodicalIF":2.5,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453846/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-12eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3079
Cristiano Moreira, Lino Ferreira, Paulo Jorge Coelho
Detecting balls in sports plays a pivotal role in enhancing game analysis, providing real-time data for spectators, and improving decision-making and strategic thinking for referees and coaches. This is a highly debated and researched topic, but most works focus on one sport. Effective generalization of a single method or algorithm to different sports is much harder to achieve. This article reviews methodologies and advancements in object detection tailored to ball detection across various sports. Traditional computer vision techniques and modern deep learning methods are visited, emphasizing their strengths, limitations, and adaptability to diverse game scenarios. The challenges of occlusion, dynamic backgrounds, varying ball sizes, and high-speed movements are identified and discussed. This review aims to consolidate existing knowledge, compare state-of-the-art detection models, highlight pivotal challenges and possible solutions, and propose future research directions. The article underscores the importance of optimizations for accurate and efficient ball detection, setting the foundation for next-generation sports analytics systems.
{"title":"A comprehensive review of ball detection techniques in sports.","authors":"Cristiano Moreira, Lino Ferreira, Paulo Jorge Coelho","doi":"10.7717/peerj-cs.3079","DOIUrl":"10.7717/peerj-cs.3079","url":null,"abstract":"<p><p>Detecting balls in sports plays a pivotal role in enhancing game analysis, providing real-time data for spectators, and improving decision-making and strategic thinking for referees and coaches. This is a highly debated and researched topic, but most works focus on one sport. Effective generalization of a single method or algorithm to different sports is much harder to achieve. This article reviews methodologies and advancements in object detection tailored to ball detection across various sports. Traditional computer vision techniques and modern deep learning methods are visited, emphasizing their strengths, limitations, and adaptability to diverse game scenarios. The challenges of occlusion, dynamic backgrounds, varying ball sizes, and high-speed movements are identified and discussed. This review aims to consolidate existing knowledge, compare state-of-the-art detection models, highlight pivotal challenges and possible solutions, and propose future research directions. The article underscores the importance of optimizations for accurate and efficient ball detection, setting the foundation for next-generation sports analytics systems.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3079"},"PeriodicalIF":2.5,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-12eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3104
Jing Sun, Yangfan Huang, Jiale Fu, Li Teng, Xiao Liu, Xiaohua Luo
Promoter prediction has a key role in helping to understand gene regulation and in developing gene therapies for complex diseases such as hearing loss (HL). While traditional Bidirectional Encoder Representations from Transformers (BERT) models excel in capturing contextual information, they often have limitations in simultaneously extracting local sequence features and long-range dependencies inherent in genomic data. To address this challenge, we propose DNABERT-CBL (DNABERT-2_CNN_BiLSTM), an enhanced BERT-based architecture that fuses a convolutional neural network (CNN) and a bidirectional long and short-term memory (BiLSTM) layer. The CNN module is able to capture local regulatory features, while the BiLSTM module can effectively model long-distance dependencies, enabling efficient integration of global and local features of promoter sequences. The models are optimized using three strategies: individual learning, cross-disease training and global training, and the performance of each module is verified by constructing comparison models with different combinations. The experimental results show that DNABERT-CBL outperforms the baseline DNABERT-2_BASE model in hearing loss promoter prediction, with a 20% reduction in loss, a 3.3% improvement in the area under the working characteristic curve (AUC) of the subjects, and a 5.8% improvement in accuracy at a sequence length of 600 base pairs. In addition, DNABERT-CBL consistently outperforms other state-of-the-art BERT-based genome models on several evaluation metrics, highlighting its superior generalization ability. Overall, DNABERT-CBL provides an effective framework for accurate promoter prediction, offers valuable insights into gene regulatory mechanisms, and supports the development of gene therapies for hearing loss and related diseases.
{"title":"An enhanced BERT model with improved local feature extraction and long-range dependency capture in promoter prediction for hearing loss.","authors":"Jing Sun, Yangfan Huang, Jiale Fu, Li Teng, Xiao Liu, Xiaohua Luo","doi":"10.7717/peerj-cs.3104","DOIUrl":"10.7717/peerj-cs.3104","url":null,"abstract":"<p><p>Promoter prediction has a key role in helping to understand gene regulation and in developing gene therapies for complex diseases such as hearing loss (HL). While traditional Bidirectional Encoder Representations from Transformers (BERT) models excel in capturing contextual information, they often have limitations in simultaneously extracting local sequence features and long-range dependencies inherent in genomic data. To address this challenge, we propose DNABERT-CBL (DNABERT-2_CNN_BiLSTM), an enhanced BERT-based architecture that fuses a convolutional neural network (CNN) and a bidirectional long and short-term memory (BiLSTM) layer. The CNN module is able to capture local regulatory features, while the BiLSTM module can effectively model long-distance dependencies, enabling efficient integration of global and local features of promoter sequences. The models are optimized using three strategies: individual learning, cross-disease training and global training, and the performance of each module is verified by constructing comparison models with different combinations. The experimental results show that DNABERT-CBL outperforms the baseline DNABERT-2_BASE model in hearing loss promoter prediction, with a 20% reduction in loss, a 3.3% improvement in the area under the working characteristic curve (AUC) of the subjects, and a 5.8% improvement in accuracy at a sequence length of 600 base pairs. In addition, DNABERT-CBL consistently outperforms other state-of-the-art BERT-based genome models on several evaluation metrics, highlighting its superior generalization ability. Overall, DNABERT-CBL provides an effective framework for accurate promoter prediction, offers valuable insights into gene regulatory mechanisms, and supports the development of gene therapies for hearing loss and related diseases.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3104"},"PeriodicalIF":2.5,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453759/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-12eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3092
Xiaomeng Xia
With the vigorous development of the art market, the management of art resources is confronted with increasingly difficult challenges, such as copyright protection, authenticity verification, and efficient storage. Currently, the digital watermarking and compression schemes applied to artworks struggle to achieve an effective balance among robustness, image quality preservation, and watermark capacity. Moreover, they lack sufficient scalability when dealing with large-scale datasets. To address these issues, this article proposes an innovative algorithm that integrates watermarking and compression for artwork images, namely the Block Classification Coding-Bit Plane Rearrangement-Integrated Compression and Watermark Embedding (BCC-BPR-ICWE) algorithm. By employing refined block classification coding (RS-BCC) and optimized bit plane rearrangement (BPR) techniques, this algorithm significantly enhances the watermark embedding capacity and robustness while ensuring image quality. Experimental results demonstrate that, compared to existing classical algorithms, the proposed method excels in terms of watermarked image quality (PSNR > 57 dB, SSIM = 0.9993), watermark capacity (0.5 bpp), and tampering recovery performance (PSNR = 41.17 dB, SSIM = 0.9993). The research in this article provides strong support for its practical application in large-scale art resource management systems. The proposed technique not only promotes the application of digital watermarking and compression technologies in the field of art management but also offers new ideas and directions for the future development of related technologies.
{"title":"Design of artwork resource management system based on block classification coding and bit plane rearrangement.","authors":"Xiaomeng Xia","doi":"10.7717/peerj-cs.3092","DOIUrl":"10.7717/peerj-cs.3092","url":null,"abstract":"<p><p>With the vigorous development of the art market, the management of art resources is confronted with increasingly difficult challenges, such as copyright protection, authenticity verification, and efficient storage. Currently, the digital watermarking and compression schemes applied to artworks struggle to achieve an effective balance among robustness, image quality preservation, and watermark capacity. Moreover, they lack sufficient scalability when dealing with large-scale datasets. To address these issues, this article proposes an innovative algorithm that integrates watermarking and compression for artwork images, namely the Block Classification Coding-Bit Plane Rearrangement-Integrated Compression and Watermark Embedding (BCC-BPR-ICWE) algorithm. By employing refined block classification coding (RS-BCC) and optimized bit plane rearrangement (BPR) techniques, this algorithm significantly enhances the watermark embedding capacity and robustness while ensuring image quality. Experimental results demonstrate that, compared to existing classical algorithms, the proposed method excels in terms of watermarked image quality (PSNR > 57 dB, SSIM = 0.9993), watermark capacity (0.5 bpp), and tampering recovery performance (PSNR = 41.17 dB, SSIM = 0.9993). The research in this article provides strong support for its practical application in large-scale art resource management systems. The proposed technique not only promotes the application of digital watermarking and compression technologies in the field of art management but also offers new ideas and directions for the future development of related technologies.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3092"},"PeriodicalIF":2.5,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453755/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-11eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3111
Ozgur Efiloglu, Muhammed Yildirim, Kadir Yildirim, Harun Bingol, Mustafa Kaan Akalin, Meftun Culpan, Bilal Alatas, Asif Yildirim
Extracorporeal shock wave lithotripsy (ESWL) is one of the most often employed therapy methods for managing kidney stones. In our work, we sought to assess the efficacy of the artificial intelligence model developed using non-contrast computed tomography (CT) images in predicting stone-free rates for ESWL. The main difference between this study and other studies is that it proposes an artificial intelligence-based model that predicts the success of ESWL treatment using artificial intelligence methods. Data from 910 patients who underwent ESWL between January 2016 and June 2021 were analyzed retrospectively. Since the local binary pattern (LBP) and histogram of oriented gradients (HOG) feature extraction methods gave more successful results than other methods, a new feature map was obtained using the neighborhood component analysis (NCA) dimension reduction method after combining the features obtained using these methods. Then, the reduced feature map was classified into classifiers. In conclusion, we analyzed the effect of ESWL treatment using different artificial intelligence methods and found that the prediction accuracy was 94% on average. Results were obtained from seven different convolutional neural networks (CNNs) and two textural-based models in the study. Since textural-based models achieved the highest success among these models, these models were used as the base in the proposed model. The proposed model achieved better results than nine different models used in the study. When the results obtained from the proposed hybrid model for ESWL prediction are examined, this model will guide experts in the treatment of the disease.
{"title":"A novel deep learning approach for predicting stone-free rates post-ESWL on uncontrasted CT.","authors":"Ozgur Efiloglu, Muhammed Yildirim, Kadir Yildirim, Harun Bingol, Mustafa Kaan Akalin, Meftun Culpan, Bilal Alatas, Asif Yildirim","doi":"10.7717/peerj-cs.3111","DOIUrl":"10.7717/peerj-cs.3111","url":null,"abstract":"<p><p>Extracorporeal shock wave lithotripsy (ESWL) is one of the most often employed therapy methods for managing kidney stones. In our work, we sought to assess the efficacy of the artificial intelligence model developed using non-contrast computed tomography (CT) images in predicting stone-free rates for ESWL. The main difference between this study and other studies is that it proposes an artificial intelligence-based model that predicts the success of ESWL treatment using artificial intelligence methods. Data from 910 patients who underwent ESWL between January 2016 and June 2021 were analyzed retrospectively. Since the local binary pattern (LBP) and histogram of oriented gradients (HOG) feature extraction methods gave more successful results than other methods, a new feature map was obtained using the neighborhood component analysis (NCA) dimension reduction method after combining the features obtained using these methods. Then, the reduced feature map was classified into classifiers. In conclusion, we analyzed the effect of ESWL treatment using different artificial intelligence methods and found that the prediction accuracy was 94% on average. Results were obtained from seven different convolutional neural networks (CNNs) and two textural-based models in the study. Since textural-based models achieved the highest success among these models, these models were used as the base in the proposed model. The proposed model achieved better results than nine different models used in the study. When the results obtained from the proposed hybrid model for ESWL prediction are examined, this model will guide experts in the treatment of the disease.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3111"},"PeriodicalIF":2.5,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453815/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-11eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3072
Mehmet Acı, Nisa Vuran Sarı, Çiğdem İnan Acı
Neural machine translation (NMT) has achieved remarkable success in high-resource language pairs; however, its effectiveness for morphologically rich and low-resource languages like Turkish remains underexplored. As a highly agglutinative and morphologically complex language with limited high-quality parallel data, Turkish serves as a representative case for evaluating NMT systems on low-resource and linguistically challenging settings. Its structural divergence from English makes it a critical testbed for assessing tokenization strategies, attention mechanisms, and model generalizability in neural translation. This study investigates the comparative performance of two prominent NMT paradigms-the Transformer architecture, and recurrent-based sequence-to-sequence (Seq2Seq) models with attention for both English-to-Turkish and Turkish-to-English translation. The models are evaluated under various configurations, including different tokenization strategies (Byte Pair Encoding (BPE) vs. Word Tokenization), attention mechanisms (Bahdanau and an exploratory hybrid mechanism combining Bahdanau and Scaled Dot-Product attention), and architectural depths (layer count and attention head number). Extensive experiments using automatic metrics such as BiLingual Evaluation Understudy (BLEU), Metric for Evaluation of Translation with Explicit ORdering (METEOR), and Translation Error Rate (TER) reveal that the Transformer model with three layers, eight attention heads, and BPE tokenization achieved the best performance, obtaining a BLEU score of 47.85 and METEOR score of 44.62 in the English-to-Turkish direction. Similar performance trends were observed in the reverse direction, indicating the model's generalizability. These findings highlight the potential of carefully optimized Transformer-based NMT systems in handling the complexities of morphologically rich, low-resource languages like Turkish in both translation directions.
{"title":"Morphological and structural complexity analysis of low-resource English-Turkish language pair using neural machine translation models.","authors":"Mehmet Acı, Nisa Vuran Sarı, Çiğdem İnan Acı","doi":"10.7717/peerj-cs.3072","DOIUrl":"10.7717/peerj-cs.3072","url":null,"abstract":"<p><p>Neural machine translation (NMT) has achieved remarkable success in high-resource language pairs; however, its effectiveness for morphologically rich and low-resource languages like Turkish remains underexplored. As a highly agglutinative and morphologically complex language with limited high-quality parallel data, Turkish serves as a representative case for evaluating NMT systems on low-resource and linguistically challenging settings. Its structural divergence from English makes it a critical testbed for assessing tokenization strategies, attention mechanisms, and model generalizability in neural translation. This study investigates the comparative performance of two prominent NMT paradigms-the Transformer architecture, and recurrent-based sequence-to-sequence (Seq2Seq) models with attention for both English-to-Turkish and Turkish-to-English translation. The models are evaluated under various configurations, including different tokenization strategies (Byte Pair Encoding (BPE) <i>vs</i>. Word Tokenization), attention mechanisms (Bahdanau and an exploratory hybrid mechanism combining Bahdanau and Scaled Dot-Product attention), and architectural depths (layer count and attention head number). Extensive experiments using automatic metrics such as BiLingual Evaluation Understudy (BLEU), Metric for Evaluation of Translation with Explicit ORdering (METEOR), and Translation Error Rate (TER) reveal that the Transformer model with three layers, eight attention heads, and BPE tokenization achieved the best performance, obtaining a BLEU score of 47.85 and METEOR score of 44.62 in the English-to-Turkish direction. Similar performance trends were observed in the reverse direction, indicating the model's generalizability. These findings highlight the potential of carefully optimized Transformer-based NMT systems in handling the complexities of morphologically rich, low-resource languages like Turkish in both translation directions.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3072"},"PeriodicalIF":2.5,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453858/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-11eCollection Date: 2025-01-01DOI: 10.7717/peerj-cs.3105
Marzuraikah Mohd Stofa, Fatimah Az Zahra Azizan, Mohd Asyraf Zulkifley
Aquatic animal husbandry is crucial for global food security and supports millions of livelihoods around the world. With the growing demand for seafood, this industry has become economically significant for many regions, contributing to local and global economies. However, as the industry grows, it faces various major challenges that are not encountered in small-scale setups. Traditional methods for classifying, detecting, and monitoring aquatic animals are often time-consuming, labor-intensive, and prone to inaccuracies. The labor-intensive nature of these operations has led many aquaculture operators to move towards automation systems. Yet, for an automation system to be effectively deployed, it needs an intelligent decision-making system, which is where deep learning techniques come into play. In this article, an extensive methodological review of machine learning methods, primarily the deep learning methods used in aquatic animal husbandry are concisely summarized. This article focuses on the use of deep learning in three key areas: classification, localization, and segmentation. Generally, classification techniques are vital in distinguishing between different species of aquatic organisms, while localization methods are used to identify the respective animal's position within a video or an image. Segmentation techniques, on the other hand, enable the precise delineation of organism boundaries, which is essential information in accurate monitoring systems. Among these key areas, segmentation techniques, particularly through the U-Net model, have shown the best results, even achieving a high segmentation performance of 94.44%. This article also highlights the potential of deep learning to enhance the precision, productivity, and sustainability of automated operations in aquatic animal husbandry. Looking ahead, deep learning offers huge potential to transform the aquaculture industry in terms of cost and operations. Future research should focus on refining existing models to better address real-world challenges such as sensor input quality and multi-modal data across various environments for better automation in the aquaculture industry.
{"title":"A review of deep learning methods in aquatic animal husbandry.","authors":"Marzuraikah Mohd Stofa, Fatimah Az Zahra Azizan, Mohd Asyraf Zulkifley","doi":"10.7717/peerj-cs.3105","DOIUrl":"10.7717/peerj-cs.3105","url":null,"abstract":"<p><p>Aquatic animal husbandry is crucial for global food security and supports millions of livelihoods around the world. With the growing demand for seafood, this industry has become economically significant for many regions, contributing to local and global economies. However, as the industry grows, it faces various major challenges that are not encountered in small-scale setups. Traditional methods for classifying, detecting, and monitoring aquatic animals are often time-consuming, labor-intensive, and prone to inaccuracies. The labor-intensive nature of these operations has led many aquaculture operators to move towards automation systems. Yet, for an automation system to be effectively deployed, it needs an intelligent decision-making system, which is where deep learning techniques come into play. In this article, an extensive methodological review of machine learning methods, primarily the deep learning methods used in aquatic animal husbandry are concisely summarized. This article focuses on the use of deep learning in three key areas: classification, localization, and segmentation. Generally, classification techniques are vital in distinguishing between different species of aquatic organisms, while localization methods are used to identify the respective animal's position within a video or an image. Segmentation techniques, on the other hand, enable the precise delineation of organism boundaries, which is essential information in accurate monitoring systems. Among these key areas, segmentation techniques, particularly through the U-Net model, have shown the best results, even achieving a high segmentation performance of 94.44%. This article also highlights the potential of deep learning to enhance the precision, productivity, and sustainability of automated operations in aquatic animal husbandry. Looking ahead, deep learning offers huge potential to transform the aquaculture industry in terms of cost and operations. Future research should focus on refining existing models to better address real-world challenges such as sensor input quality and multi-modal data across various environments for better automation in the aquaculture industry.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3105"},"PeriodicalIF":2.5,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453753/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the digital revolution, access to information is expanding day by day and individuals can access information quickly through the internet and social media platforms. However, in most cases, there is no mechanism in place to evaluate the accuracy of news that spreads rapidly on social media. This increases the potential for fake news to mislead both individuals and society. In order to minimize the negative effects of fake news, it has become a critical necessity to detect them quickly and effectively. Metaheuristic methods can provide more effective solutions in fake news detection compared to traditional methods. Especially in small datasets, metaheuristics are known to produce faster and more effective solutions than artificial intelligence and machine learning based methods. In the literature, the majority of fake news detection studies have focused on the optimization of a single criterion. In this study, unlike other studies, a method that enables simultaneous optimization of two criteria (precision and recall) in fake news detection is developed. In the proposed approach, an innovative solution is presented by using the Crowding Distance Level method instead of the Crowding Distance method used in the standard Non-dominated Sorting Genetic Algorithm 2 (NSGA-2) algorithm. The proposed method is tested on four different datasets such as Covid-19, Syrian war daily news and FakeNewsNet (Gossipcop). The results show that the proposed method achieves high success especially on small datasets.
{"title":"Innovative multi objective optimization based automatic fake news detection.","authors":"Cebrail Barut, Suna Yildirim, Bilal Alatas, Gungor Yildirim","doi":"10.7717/peerj-cs.3016","DOIUrl":"10.7717/peerj-cs.3016","url":null,"abstract":"<p><p>With the digital revolution, access to information is expanding day by day and individuals can access information quickly through the internet and social media platforms. However, in most cases, there is no mechanism in place to evaluate the accuracy of news that spreads rapidly on social media. This increases the potential for fake news to mislead both individuals and society. In order to minimize the negative effects of fake news, it has become a critical necessity to detect them quickly and effectively. Metaheuristic methods can provide more effective solutions in fake news detection compared to traditional methods. Especially in small datasets, metaheuristics are known to produce faster and more effective solutions than artificial intelligence and machine learning based methods. In the literature, the majority of fake news detection studies have focused on the optimization of a single criterion. In this study, unlike other studies, a method that enables simultaneous optimization of two criteria (precision and recall) in fake news detection is developed. In the proposed approach, an innovative solution is presented by using the Crowding Distance Level method instead of the Crowding Distance method used in the standard Non-dominated Sorting Genetic Algorithm 2 (NSGA-2) algorithm. The proposed method is tested on four different datasets such as Covid-19, Syrian war daily news and FakeNewsNet (Gossipcop). The results show that the proposed method achieves high success especially on small datasets.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3016"},"PeriodicalIF":2.5,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453838/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article addresses the urgent need to detect destructive content, including religious extremism, racism, cyberbullying, and nation oriented extremism messages, on social media platforms in the Kazakh language. Given the agglutinative structure and rich morphology of Kazakh, standard natural language processing (NLP) models require significant adaptation. The study employs a range of machine learning and deep learning techniques, such as logistic regression, support vector machines (SVM), and long short-term memory (LSTM) networks, to classify destructive content. This article demonstrates the effectiveness of combining n-gram and stemming methods with machine learning algorithms, achieving high accuracy in content classification. The findings underscore the importance of developing language-specific NLP tools tailored to Kazakh's linguistic complexities. This research not only contributes to ensuring online safety by detecting destructive content in Kazakh digital spaces, but also provides a framework for applying similar techniques to other lesser-resourced languages.
{"title":"Detection of offensive content in the Kazakh language using machine learning and deep learning approaches.","authors":"Milana Bolatbek, Moldir Sagynay, Shynar Mussiraliyeva, Zhastay Yeltay","doi":"10.7717/peerj-cs.3027","DOIUrl":"10.7717/peerj-cs.3027","url":null,"abstract":"<p><p>This article addresses the urgent need to detect destructive content, including religious extremism, racism, cyberbullying, and nation oriented extremism messages, on social media platforms in the Kazakh language. Given the agglutinative structure and rich morphology of Kazakh, standard natural language processing (NLP) models require significant adaptation. The study employs a range of machine learning and deep learning techniques, such as logistic regression, support vector machines (SVM), and long short-term memory (LSTM) networks, to classify destructive content. This article demonstrates the effectiveness of combining n-gram and stemming methods with machine learning algorithms, achieving high accuracy in content classification. The findings underscore the importance of developing language-specific NLP tools tailored to Kazakh's linguistic complexities. This research not only contributes to ensuring online safety by detecting destructive content in Kazakh digital spaces, but also provides a framework for applying similar techniques to other lesser-resourced languages.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3027"},"PeriodicalIF":2.5,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453855/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sentiment structure analysis in Chinese text typically relies on supervised deep-learning methods for sequence labeling. However, obtaining large-scale labeled datasets is both resource-intensive and time-consuming. To address these challenges, this study proposes Dynamically Detecting Subsequence Uncertainty and Diversity (DDSUD), a Bidirectional Encoder Representations from Transformers (BERT)-based active learning framework designed to tackle subsequence uncertainty and enhance the diversity of imbalanced datasets. DDSUD combines subsequence uncertainty detection, diversity-driven sample selection, and dynamic weighting, enabling an adaptive balance between these factors throughout the active learning iterations. Experimental results show that DDSUD achieves performance close to fully supervised training schemes with only 50% of the data labeled, and outperforms other state-of-the-art active learning methods with the same amount of labeled data. Moreover, by dynamically adjusting the trade-off between subsequence uncertainty and diversity, DDSUD demonstrates strong adaptability and generalization capability in low-resource environments, especially in handling imbalanced datasets, significantly improving the recognition of minority class samples.
{"title":"DDSUD: dynamically detecting subsequence uncertainty and diversity for active learning in imbalanced Chinese sentiment analysis.","authors":"Shufeng Xiong, Yibo Si, Guipei Zhang, Bingkun Wang, Guang Zheng, Haiping Si","doi":"10.7717/peerj-cs.3091","DOIUrl":"10.7717/peerj-cs.3091","url":null,"abstract":"<p><p>Sentiment structure analysis in Chinese text typically relies on supervised deep-learning methods for sequence labeling. However, obtaining large-scale labeled datasets is both resource-intensive and time-consuming. To address these challenges, this study proposes Dynamically Detecting Subsequence Uncertainty and Diversity (DDSUD), a Bidirectional Encoder Representations from Transformers (BERT)-based active learning framework designed to tackle subsequence uncertainty and enhance the diversity of imbalanced datasets. DDSUD combines subsequence uncertainty detection, diversity-driven sample selection, and dynamic weighting, enabling an adaptive balance between these factors throughout the active learning iterations. Experimental results show that DDSUD achieves performance close to fully supervised training schemes with only 50% of the data labeled, and outperforms other state-of-the-art active learning methods with the same amount of labeled data. Moreover, by dynamically adjusting the trade-off between subsequence uncertainty and diversity, DDSUD demonstrates strong adaptability and generalization capability in low-resource environments, especially in handling imbalanced datasets, significantly improving the recognition of minority class samples.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3091"},"PeriodicalIF":2.5,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453870/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}