首页 > 最新文献

PeerJ Computer Science最新文献

英文 中文
Feature selection for emotion recognition in speech: a comparative study of filter and wrapper methods. 语音情感识别的特征选择:滤波和包装方法的比较研究。
IF 2.5 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-16 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.3180
Alaa Altheneyan, Aseel Alhadlaq

Feature selection is essential for enhancing the performance and reducing the complexity of speech emotion recognition models. This article evaluates various feature selection methods, including correlation-based (CB), mutual information (MI), and recursive feature elimination (RFE), against baseline approaches using three different feature sets: (1) all available features (Mel-frequency cepstral coefficients (MFCC), root mean square energy (RMS), zero crossing rate (ZCR), chromagram, spectral centroid frequency (SCF), Tonnetz, Mel spectrogram, and spectral bandwidth), totaling 170 features; (2) a five-feature subset (MFCC, RMS, ZCR, Chromagram, and Mel spectrogram), totaling 163 features; and (3) a six-feature subset (MFCC, RMS, ZCR, SCF, Tonnetz, and Mel spectrogram), totaling 157 features. Methods are compared based on precision, recall, F1-score, accuracy, and the number of features selected. Results show that using all features yields an accuracy of 61.42%, but often includes irrelevant data. MI with 120 features achieves the highest performance, with precision, recall, F1-score, and accuracy at 65%, 65%, 65%, and 64.71%, respectively. CB methods with moderate thresholds also perform well, balancing simplicity and accuracy. RFE methods improve consistently with more features, stabilizing around 120 features.

特征选择对于提高语音情感识别模型的性能和降低其复杂性至关重要。本文评估了各种特征选择方法,包括基于相关的(CB),互信息(MI)和递归特征消除(RFE),针对基线方法使用三种不同的特征集:(1)所有可用的特征(Mel频率倒谱系数(MFCC),均方根能量(RMS),零交叉率(ZCR),色谱图,频谱质心频率(SCF), Tonnetz, Mel谱图和频谱带宽),共计170个特征;(2) 5个特征子集(MFCC、RMS、ZCR、Chromagram和Mel谱图),共163个特征;(3) 6个特征子集(MFCC、RMS、ZCR、SCF、Tonnetz和Mel谱图),共有157个特征。基于精度、召回率、f1分、准确度和所选特征的数量对方法进行比较。结果表明,使用所有特征的准确率为61.42%,但通常包含不相关的数据。具有120个特征的MI达到了最高的性能,准确率、召回率、f1得分和准确率分别为65%、65%、65%和64.71%。具有中等阈值的CB方法也表现良好,平衡了简单性和准确性。随着功能的增加,RFE方法不断改进,稳定在120个功能左右。
{"title":"Feature selection for emotion recognition in speech: a comparative study of filter and wrapper methods.","authors":"Alaa Altheneyan, Aseel Alhadlaq","doi":"10.7717/peerj-cs.3180","DOIUrl":"10.7717/peerj-cs.3180","url":null,"abstract":"<p><p>Feature selection is essential for enhancing the performance and reducing the complexity of speech emotion recognition models. This article evaluates various feature selection methods, including correlation-based (CB), mutual information (MI), and recursive feature elimination (RFE), against baseline approaches using three different feature sets: (1) all available features (Mel-frequency cepstral coefficients (MFCC), root mean square energy (RMS), zero crossing rate (ZCR), chromagram, spectral centroid frequency (SCF), Tonnetz, Mel spectrogram, and spectral bandwidth), totaling 170 features; (2) a five-feature subset (MFCC, RMS, ZCR, Chromagram, and Mel spectrogram), totaling 163 features; and (3) a six-feature subset (MFCC, RMS, ZCR, SCF, Tonnetz, and Mel spectrogram), totaling 157 features. Methods are compared based on precision, recall, F1-score, accuracy, and the number of features selected. Results show that using all features yields an accuracy of 61.42%, but often includes irrelevant data. MI with 120 features achieves the highest performance, with precision, recall, F1-score, and accuracy at 65%, 65%, 65%, and 64.71%, respectively. CB methods with moderate thresholds also perform well, balancing simplicity and accuracy. RFE methods improve consistently with more features, stabilizing around 120 features.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3180"},"PeriodicalIF":2.5,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing human activity recognition with machine learning: insights from smartphone accelerometer and magnetometer data. 用机器学习增强人类活动识别:来自智能手机加速计和磁力计数据的见解。
IF 2.5 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-15 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.3137
Luis Augusto Silva Zendron, Paulo Jorge Coelho, Christophe Soares, Ivo Pereira, Ivan Miguel Pires

The domain of Human Activity Recognition (HAR) has undergone a remarkable evolution, driven by advancements in sensor technology, artificial intelligence (AI), and machine learning algorithms. The aim of this article consists of taking as a basis the previously obtained results to implement other techniques to analyze the same dataset and improve the results previously obtained in the different studies, such as neural networks with different configurations, random forest, support vector machine, CN2 rule inducer, Naive Bayes, and AdaBoost. The methodology consists of data collection from smartphone sensors, data cleaning and normalization, feature extraction techniques, and the implementation of various machine learning models. The study analyzed machine learning models for recognizing human activities using data from smartphone sensors. The results showed that the neural network and random forest models were highly effective across multiple metrics. The models achieved an area under the curve (AUC) of 98.42%, a classification accuracy of 90.14%, an F1-score of 90.13%, a precision of 90.18%, and a recall of 90.14%. With significantly reduced computational cost, our approach outperforms earlier models using the same dataset and achieves results comparable to those of contemporary deep learning-based approaches. Unlike prior studies, our work utilizes non-normalized data and integrates magnetometer signals to enhance performance, all while employing lightweight models within a reproducible visual workflow. This approach is novel, efficient, and deployable on mobile devices in real-time. This approach makes it an ideal fit for real-time mobile applications.

在传感器技术、人工智能(AI)和机器学习算法进步的推动下,人类活动识别(HAR)领域经历了显著的发展。本文的目的是在之前得到的结果的基础上,实现其他技术来分析相同的数据集,并改进之前在不同研究中得到的结果,如不同配置的神经网络、随机森林、支持向量机、CN2规则诱导器、朴素贝叶斯、AdaBoost等。该方法包括从智能手机传感器收集数据、数据清洗和规范化、特征提取技术以及各种机器学习模型的实现。该研究分析了利用智能手机传感器数据识别人类活动的机器学习模型。结果表明,神经网络和随机森林模型在多个指标上都是非常有效的。模型的曲线下面积(AUC)为98.42%,分类准确率为90.14%,f1评分为90.13%,准确率为90.18%,召回率为90.14%。通过显著降低计算成本,我们的方法优于使用相同数据集的早期模型,并获得与当代基于深度学习的方法相当的结果。与之前的研究不同,我们的工作利用非标准化数据并集成磁力计信号来提高性能,同时在可重复的可视化工作流程中采用轻量级模型。这种方法新颖、高效,并且可以在移动设备上实时部署。这种方法使其非常适合实时移动应用程序。
{"title":"Enhancing human activity recognition with machine learning: insights from smartphone accelerometer and magnetometer data.","authors":"Luis Augusto Silva Zendron, Paulo Jorge Coelho, Christophe Soares, Ivo Pereira, Ivan Miguel Pires","doi":"10.7717/peerj-cs.3137","DOIUrl":"10.7717/peerj-cs.3137","url":null,"abstract":"<p><p>The domain of Human Activity Recognition (HAR) has undergone a remarkable evolution, driven by advancements in sensor technology, artificial intelligence (AI), and machine learning algorithms. The aim of this article consists of taking as a basis the previously obtained results to implement other techniques to analyze the same dataset and improve the results previously obtained in the different studies, such as neural networks with different configurations, random forest, support vector machine, CN2 rule inducer, Naive Bayes, and AdaBoost. The methodology consists of data collection from smartphone sensors, data cleaning and normalization, feature extraction techniques, and the implementation of various machine learning models. The study analyzed machine learning models for recognizing human activities using data from smartphone sensors. The results showed that the neural network and random forest models were highly effective across multiple metrics. The models achieved an area under the curve (AUC) of 98.42%, a classification accuracy of 90.14%, an F1-score of 90.13%, a precision of 90.18%, and a recall of 90.14%. With significantly reduced computational cost, our approach outperforms earlier models using the same dataset and achieves results comparable to those of contemporary deep learning-based approaches. Unlike prior studies, our work utilizes non-normalized data and integrates magnetometer signals to enhance performance, all while employing lightweight models within a reproducible visual workflow. This approach is novel, efficient, and deployable on mobile devices in real-time. This approach makes it an ideal fit for real-time mobile applications.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3137"},"PeriodicalIF":2.5,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453735/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research on furniture image classification based on MobileNetNAK. 基于MobileNetNAK的家具图像分类研究。
IF 2.5 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-15 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.3178
Danyang Zhang, Yi Zhai, Peiyuan Li, Fan Yang, Runpeng Du

With the rapid development of the furniture industry, automatic classification of furniture images has become an important research area. However, this task faces several challenges, including complex image backgrounds, diverse furniture types, and varying forms. To address these issues, we propose a novel furniture image classification method, MobileNetNAK, based on the MobileNetV3 network. First, the method integrates a non-local attention module to capture non-local dependencies within images, significantly enhancing the model's ability to extract key information. Second, the Adamax optimizer is employed to train the model. By adaptively adjusting the learning rate, it accelerates convergence and reduces the risk of overfitting. Third, the Kolmogorov-Arnold networks method is incorporated to decompose complex convolution operations into multiple simpler ones, thereby improving computational efficiency and feature extraction capabilities. Experimental results demonstrate that MobileNetNAK significantly improves classification performance in furniture image tasks. On Dataset 1, the model achieves improvements of 6.7%, 6.6%, 6.6%, and 6.6% in accuracy, precision, recall, and F1-score, respectively, compared to the baseline. On Dataset 2, the corresponding improvements are 2.7%, 2.4%, 2.7%, and 2.9%. Additionally, the model maintains a high inference speed of 147.80 fps, balancing performance with computational efficiency. These results highlight the strong adaptability and deployment potential of MobileNetNAK in multi-category and fine-grained furniture image classification tasks, offering a novel and effective solution for this domain.

随着家具行业的快速发展,家具图像的自动分类已成为一个重要的研究领域。然而,这项任务面临着一些挑战,包括复杂的图像背景、不同的家具类型和不同的形式。为了解决这些问题,我们提出了一种新的基于MobileNetV3网络的家具图像分类方法——MobileNetNAK。首先,该方法集成了非局部关注模块来捕获图像中的非局部依赖关系,显著增强了模型提取关键信息的能力。其次,利用Adamax优化器对模型进行训练。通过自适应调整学习率,加快了收敛速度,降低了过拟合的风险。第三,采用Kolmogorov-Arnold网络方法,将复杂的卷积运算分解为多个简单的卷积运算,从而提高了计算效率和特征提取能力。实验结果表明,MobileNetNAK显著提高了家具图像任务的分类性能。在数据集1上,与基线相比,该模型在准确率、精密度、召回率和f1得分方面分别提高了6.7%、6.6%、6.6%和6.6%。在数据集2上,相应的改进分别为2.7%、2.4%、2.7%和2.9%。此外,该模型保持了147.80 fps的高推理速度,平衡了性能和计算效率。这些结果突出了MobileNetNAK在多类别和细粒度家具图像分类任务中的强大适应性和部署潜力,为该领域提供了一种新颖有效的解决方案。
{"title":"Research on furniture image classification based on MobileNetNAK.","authors":"Danyang Zhang, Yi Zhai, Peiyuan Li, Fan Yang, Runpeng Du","doi":"10.7717/peerj-cs.3178","DOIUrl":"10.7717/peerj-cs.3178","url":null,"abstract":"<p><p>With the rapid development of the furniture industry, automatic classification of furniture images has become an important research area. However, this task faces several challenges, including complex image backgrounds, diverse furniture types, and varying forms. To address these issues, we propose a novel furniture image classification method, MobileNetNAK, based on the MobileNetV3 network. First, the method integrates a non-local attention module to capture non-local dependencies within images, significantly enhancing the model's ability to extract key information. Second, the Adamax optimizer is employed to train the model. By adaptively adjusting the learning rate, it accelerates convergence and reduces the risk of overfitting. Third, the Kolmogorov-Arnold networks method is incorporated to decompose complex convolution operations into multiple simpler ones, thereby improving computational efficiency and feature extraction capabilities. Experimental results demonstrate that MobileNetNAK significantly improves classification performance in furniture image tasks. On Dataset 1, the model achieves improvements of 6.7%, 6.6%, 6.6%, and 6.6% in accuracy, precision, recall, and F1-score, respectively, compared to the baseline. On Dataset 2, the corresponding improvements are 2.7%, 2.4%, 2.7%, and 2.9%. Additionally, the model maintains a high inference speed of 147.80 fps, balancing performance with computational efficiency. These results highlight the strong adaptability and deployment potential of MobileNetNAK in multi-category and fine-grained furniture image classification tasks, offering a novel and effective solution for this domain.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3178"},"PeriodicalIF":2.5,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453834/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel framework for secure cryptocurrency transactions using quantum crypto guard. 使用量子加密保护的安全加密货币交易的新框架。
IF 2.5 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-12 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.3030
Jamil Abedalrahim Jamil Alsayaydeh, Mohd Faizal Yusof, Nor Adnan Yahaya, Viacheslav Kovtun, Safarudin Gazali Herawan

In today's digital world, cryptocurrencies like Bitcoin can secure transactions without banks. However, the rise of quantum computing poses significant threats to their security, as traditional cryptographic methods may be easily compromised. In addition, the existing algorithms face difficulties like slow transaction speeds, interoperability issues between different cryptocurrencies, and privacy concerns. Hence, Quantum Crypto Guard for Secure Transactions (QCG-ST), a novel blockchain framework, is introduced, offering enhanced security and efficiency for cryptocurrency transactions. The QCG-ST employs lattice-based cryptography to provide robust protection against quantum threats and incorporates a new consensus mechanism to increase the transaction speed and reduce energy consumption. The QCG-ST system uses lattice-based encryption that is based on the Ring Learning With Errors (Ring-LWE) issue to protect itself from quantum assaults. It uses sharding, a Proof-of-Stake (PoS) consensus method, and a threshold signature scheme (TSS) to make the system more scalable and use less energy. Zero-knowledge proofs (ZKPs) are used to check transactions without giving out private information. We offer a cross-chain atomic swap protocol that uses hashed time-lock contracts to make sure that it works on all platforms. Blockchain transaction data utilized in testing originated from the Bitcoin Historical Dataset available on Kaggle, and quantum resistance has been assessed using the Qiskit Aer simulator. It evaluated the framework's performance to that of traditional methods like Payment Channel-Lightning Network (PC-LN), Variational Quantum Eigensolver (VQE), and Cross-Chain Transaction with Hyperledger (CCT-H). Results show that QCG-ST does far better than traditional systems in terms of transaction success rate (up to 98.5%), speed, energy efficiency, latency, and throughput, especially when tested in a quantum-simulated environment. This study completes in an essential vacuum in blockchain technology by suggesting a strong, quantum-resistant, privacy-protecting architecture that can handle the problems that could arise up in decentralized digital banking in the future.

在当今的数字世界中,像比特币这样的加密货币可以在没有银行的情况下确保交易的安全。然而,量子计算的兴起对它们的安全性构成了重大威胁,因为传统的加密方法可能很容易被破坏。此外,现有算法还面临交易速度慢、不同加密货币之间的互操作性问题以及隐私问题等困难。因此,引入了一种新的区块链框架——量子加密安全交易保护(QCG-ST),为加密货币交易提供了更高的安全性和效率。QCG-ST采用基于格的加密技术提供强大的量子威胁保护,并采用新的共识机制来提高交易速度并降低能耗。QCG-ST系统使用基于错误环学习(Ring- lwe)问题的基于格子的加密来保护自己免受量子攻击。它使用分片、权益证明(PoS)共识方法和阈值签名方案(TSS)来使系统更具可扩展性并使用更少的能量。零知识证明(ZKPs)用于在不提供私人信息的情况下检查交易。我们提供了一个跨链原子交换协议,它使用散列时间锁合约来确保它在所有平台上都能工作。测试中使用的区块链交易数据来自Kaggle上可用的比特币历史数据集,并且使用Qiskit Aer模拟器评估了量子阻力。它将该框架的性能评估为传统方法,如支付通道闪电网络(PC-LN),变分量子特征求解器(VQE)和超级账本跨链交易(CCT-H)。结果表明,在事务成功率(高达98.5%)、速度、能源效率、延迟和吞吐量方面,QCG-ST比传统系统要好得多,特别是在量子模拟环境中进行测试时。这项研究在区块链技术的基本真空中完成,提出了一种强大的、抗量子的、保护隐私的架构,可以处理未来去中心化数字银行可能出现的问题。
{"title":"A novel framework for secure cryptocurrency transactions using quantum crypto guard.","authors":"Jamil Abedalrahim Jamil Alsayaydeh, Mohd Faizal Yusof, Nor Adnan Yahaya, Viacheslav Kovtun, Safarudin Gazali Herawan","doi":"10.7717/peerj-cs.3030","DOIUrl":"https://doi.org/10.7717/peerj-cs.3030","url":null,"abstract":"<p><p>In today's digital world, cryptocurrencies like Bitcoin can secure transactions without banks. However, the rise of quantum computing poses significant threats to their security, as traditional cryptographic methods may be easily compromised. In addition, the existing algorithms face difficulties like slow transaction speeds, interoperability issues between different cryptocurrencies, and privacy concerns. Hence, Quantum Crypto Guard for Secure Transactions (QCG-ST), a novel blockchain framework, is introduced, offering enhanced security and efficiency for cryptocurrency transactions. The QCG-ST employs lattice-based cryptography to provide robust protection against quantum threats and incorporates a new consensus mechanism to increase the transaction speed and reduce energy consumption. The QCG-ST system uses lattice-based encryption that is based on the Ring Learning With Errors (Ring-LWE) issue to protect itself from quantum assaults. It uses sharding, a Proof-of-Stake (PoS) consensus method, and a threshold signature scheme (TSS) to make the system more scalable and use less energy. Zero-knowledge proofs (ZKPs) are used to check transactions without giving out private information. We offer a cross-chain atomic swap protocol that uses hashed time-lock contracts to make sure that it works on all platforms. Blockchain transaction data utilized in testing originated from the Bitcoin Historical Dataset available on Kaggle, and quantum resistance has been assessed using the Qiskit Aer simulator. It evaluated the framework's performance to that of traditional methods like Payment Channel-Lightning Network (PC-LN), Variational Quantum Eigensolver (VQE), and Cross-Chain Transaction with Hyperledger (CCT-H). Results show that QCG-ST does far better than traditional systems in terms of transaction success rate (up to 98.5%), speed, energy efficiency, latency, and throughput, especially when tested in a quantum-simulated environment. This study completes in an essential vacuum in blockchain technology by suggesting a strong, quantum-resistant, privacy-protecting architecture that can handle the problems that could arise up in decentralized digital banking in the future.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3030"},"PeriodicalIF":2.5,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453740/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Financial trading decision model based on deep reinforcement learning for smart agricultural management. 基于深度强化学习的智能农业金融交易决策模型。
IF 2.5 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-12 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.3196
Di Fan, Nazrul Hisyam Ab Razak, Wei Ni Soh

This study proposes a decision-making model based on deep reinforcement learning (DRL) for agricultural financial transactions, addressing core challenges such as significant data noise, strong time-series dependence, and limited strategy adaptability. We developed a multifactor dynamic denoising framework by integrating the Grubbs test for outlier detection and the median absolute deviation (MAD) method for noise handling. This framework categorizes agricultural financial indicators into six feature types, significantly enhancing robustness against data noise and improving model reliability. Furthermore, an long short-term memory (LSTM)-enhanced DRL architecture is employed, incorporating a sliding window mechanism to capture market timing features. This framework constructs a transaction cost-based reward function. It establishes an intelligent trading decision model based on the LSTM algorithm and the data query language (DQL). Experimental results demonstrate an annualized return of 45.12% and a 35% reduction in maximum retracement for Deere & Company and BAYN.DE. The Sharpe ratio reaches 1.51, reflecting a 62% improvement over the benchmark model. The results validate the robustness of the proposed decision-making model in the face of price fluctuations and policy interventions. This model addresses critical bottlenecks in the application of DRL in agricultural finance, facilitating the transition of agricultural economic management from empirical judgment to data-driven approaches. Through three key innovations-data denoising, time-series modeling, and domain adaptation-it provides a vital decision-support tool for advancing smart agriculture.

本文提出了一种基于深度强化学习(DRL)的农业金融交易决策模型,解决了数据噪声大、时间序列依赖性强、策略适应性有限等核心挑战。我们开发了一个多因素动态去噪框架,整合了用于异常值检测的Grubbs检验和用于噪声处理的中位数绝对偏差(MAD)方法。该框架将农业财务指标分为六种特征类型,显著增强了对数据噪声的鲁棒性,提高了模型的可靠性。此外,采用了长短期记忆(LSTM)增强的DRL架构,结合滑动窗口机制来捕捉市场时机特征。该框架构建了一个基于交易成本的奖励函数。建立了基于LSTM算法和数据查询语言(DQL)的智能交易决策模型。实验结果表明,Deere & Company和BAYN.DE的年化回报率为45.12%,最大回撤幅度减少了35%。夏普比率达到1.51,比基准模型提高了62%。结果验证了所提出的决策模型在面对价格波动和政策干预时的鲁棒性。该模型解决了DRL在农业金融中应用的关键瓶颈,促进了农业经济管理从经验判断向数据驱动方法的转变。通过三个关键创新——数据去噪、时间序列建模和领域自适应——它为推进智慧农业提供了重要的决策支持工具。
{"title":"Financial trading decision model based on deep reinforcement learning for smart agricultural management.","authors":"Di Fan, Nazrul Hisyam Ab Razak, Wei Ni Soh","doi":"10.7717/peerj-cs.3196","DOIUrl":"10.7717/peerj-cs.3196","url":null,"abstract":"<p><p>This study proposes a decision-making model based on deep reinforcement learning (DRL) for agricultural financial transactions, addressing core challenges such as significant data noise, strong time-series dependence, and limited strategy adaptability. We developed a multifactor dynamic denoising framework by integrating the Grubbs test for outlier detection and the median absolute deviation (MAD) method for noise handling. This framework categorizes agricultural financial indicators into six feature types, significantly enhancing robustness against data noise and improving model reliability. Furthermore, an long short-term memory (LSTM)-enhanced DRL architecture is employed, incorporating a sliding window mechanism to capture market timing features. This framework constructs a transaction cost-based reward function. It establishes an intelligent trading decision model based on the LSTM algorithm and the data query language (DQL). Experimental results demonstrate an annualized return of 45.12% and a 35% reduction in maximum retracement for Deere & Company and BAYN.DE. The Sharpe ratio reaches 1.51, reflecting a 62% improvement over the benchmark model. The results validate the robustness of the proposed decision-making model in the face of price fluctuations and policy interventions. This model addresses critical bottlenecks in the application of DRL in agricultural finance, facilitating the transition of agricultural economic management from empirical judgment to data-driven approaches. Through three key innovations-data denoising, time-series modeling, and domain adaptation-it provides a vital decision-support tool for advancing smart agriculture.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3196"},"PeriodicalIF":2.5,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453850/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sales forecasting for retail stores using hybrid neural networks and sales-affecting variables. 基于混合神经网络和销售影响变量的零售商店销售预测。
IF 2.5 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-11 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.3058
Saad Mansur, Kashif Sattar, Seyed Ebrahim Hosseini, Shahbaz Pervez, Iftikhar Ahmad, Kashif Saleem, Ahmed Zohier Elhendi

Accurate sales forecasting is vital for balancing demand and supply and enhancing profitability in the retail sector. Deep learning (DL) models have shown promise in this area; however, most either handle temporal or spatial patterns in isolation. Moreover, many studies rely on synthetic datasets or omit critical contextual variables, reducing real-world accuracy. This study proposes a hybrid convolutional neural network (CNN)-long short-term memory (LSTM) model for retail sales forecasting using real-world data enhanced with environmental and demographic variables in term of holidays, salary days, protests, and weather conditions. CNNs capture spatial patterns, while LSTMs model temporal dependencies, making the hybrid architecture well-suited for multivariate forecasting tasks. Our model demonstrates a significant improvement in predictive performance, achieving a mean absolute percentage error (MAPE) of 4.16%, outperforming traditional and standalone neural models. By incorporating external factors, the proposed approach enables more reliable forecasting and supports informed decision-making in retail operations.

准确的销售预测对于平衡供需和提高零售部门的盈利能力至关重要。深度学习(DL)模型在这一领域显示出了前景;然而,大多数都是孤立地处理时间或空间模式。此外,许多研究依赖于合成数据集或忽略关键的上下文变量,从而降低了现实世界的准确性。本研究提出了一种混合卷积神经网络(CNN)长短期记忆(LSTM)模型,用于零售销售预测,该模型使用真实世界数据,并在假期、工资日、抗议活动和天气条件等环境和人口变量中增强。cnn捕获空间模式,而LSTMs建模时间依赖性,使得混合架构非常适合多变量预测任务。我们的模型在预测性能上有了显著的提高,实现了4.16%的平均绝对百分比误差(MAPE),优于传统的和独立的神经模型。通过结合外部因素,建议的方法可以实现更可靠的预测,并支持零售业务的明智决策。
{"title":"Sales forecasting for retail stores using hybrid neural networks and sales-affecting variables.","authors":"Saad Mansur, Kashif Sattar, Seyed Ebrahim Hosseini, Shahbaz Pervez, Iftikhar Ahmad, Kashif Saleem, Ahmed Zohier Elhendi","doi":"10.7717/peerj-cs.3058","DOIUrl":"10.7717/peerj-cs.3058","url":null,"abstract":"<p><p>Accurate sales forecasting is vital for balancing demand and supply and enhancing profitability in the retail sector. Deep learning (DL) models have shown promise in this area; however, most either handle temporal or spatial patterns in isolation. Moreover, many studies rely on synthetic datasets or omit critical contextual variables, reducing real-world accuracy. This study proposes a hybrid convolutional neural network (CNN)-long short-term memory (LSTM) model for retail sales forecasting using real-world data enhanced with environmental and demographic variables in term of holidays, salary days, protests, and weather conditions. CNNs capture spatial patterns, while LSTMs model temporal dependencies, making the hybrid architecture well-suited for multivariate forecasting tasks. Our model demonstrates a significant improvement in predictive performance, achieving a mean absolute percentage error (MAPE) of 4.16%, outperforming traditional and standalone neural models. By incorporating external factors, the proposed approach enables more reliable forecasting and supports informed decision-making in retail operations.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3058"},"PeriodicalIF":2.5,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453866/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Particle swarm optimization framework for Parkinson's disease prediction. 帕金森病预测的粒子群优化框架。
IF 2.5 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-11 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.3135
Entesar Hamed I Eliwa, Tarek Abd El-Hafeez

Early diagnosis of Parkinson's disease (PD) is challenging due to subtle initial symptoms. This study introduces an advanced machine learning framework that leverages particle swarm optimization (PSO) to improve PD detection through vocal biomarker analysis. Our novel approach unifies the optimization of both acoustic feature selection and classifier hyperparameter tuning within a single computational architecture. We systematically evaluated PSO-enhanced predictive models for PD detection using two comprehensive clinical datasets. Dataset 1 includes 1,195 patient records with 24 clinical features, and Dataset 2 comprises 2,105 patient records with 33 multidimensional features spanning demographic, lifestyle, medical history, and clinical assessment variables. For Dataset 1, the PSO model achieved 96.7% testing accuracy, an absolute improvement of 2.6% over the best-performing traditional classifier (Bagging classifier at 94.1%), while maintaining exceptional sensitivity (99.0%) and specificity (94.6%). Results were even more significant for Dataset 2, where the PSO model reached 98.9% final accuracy, a 3.9% improvement over the LGBM classifier (95.0%), with near-perfect discriminative capability (AUC = 0.999). These performance gains were achieved with reasonable computational overhead, averaging 250.93 s training time for Dataset 2, suggesting the practical viability of PSO optimization for clinical prediction tasks. Our findings underscore the potential of intelligent optimization techniques in developing practical decision support systems for early neurodegenerative disease detection, with significant implications for clinical practice.

早期诊断帕金森病(PD)是具有挑战性的,由于微妙的初始症状。本研究引入了一个先进的机器学习框架,利用粒子群优化(PSO)通过声音生物标志物分析来提高PD检测。我们的新方法将声学特征选择和分类器超参数调整的优化统一在一个单一的计算架构中。我们使用两个综合的临床数据集系统地评估了pso增强的PD检测预测模型。数据集1包括1195例患者记录和24个临床特征,数据集2包括2105例患者记录和33个多维特征,涵盖人口统计、生活方式、病史和临床评估变量。对于数据集1,PSO模型达到了96.7%的测试准确率,比表现最好的传统分类器(Bagging分类器为94.1%)提高了2.6%,同时保持了优异的灵敏度(99.0%)和特异性(94.6%)。数据集2的结果更加显著,其中PSO模型达到98.9%的最终准确率,比LGBM分类器(95.0%)提高3.9%,具有接近完美的判别能力(AUC = 0.999)。这些性能提升是在合理的计算开销下实现的,数据集2的平均训练时间为250.93秒,这表明PSO优化在临床预测任务中的实际可行性。我们的研究结果强调了智能优化技术在开发用于早期神经退行性疾病检测的实用决策支持系统方面的潜力,对临床实践具有重要意义。
{"title":"Particle swarm optimization framework for Parkinson's disease prediction.","authors":"Entesar Hamed I Eliwa, Tarek Abd El-Hafeez","doi":"10.7717/peerj-cs.3135","DOIUrl":"10.7717/peerj-cs.3135","url":null,"abstract":"<p><p>Early diagnosis of Parkinson's disease (PD) is challenging due to subtle initial symptoms. This study introduces an advanced machine learning framework that leverages particle swarm optimization (PSO) to improve PD detection through vocal biomarker analysis. Our novel approach unifies the optimization of both acoustic feature selection and classifier hyperparameter tuning within a single computational architecture. We systematically evaluated PSO-enhanced predictive models for PD detection using two comprehensive clinical datasets. Dataset 1 includes 1,195 patient records with 24 clinical features, and Dataset 2 comprises 2,105 patient records with 33 multidimensional features spanning demographic, lifestyle, medical history, and clinical assessment variables. For Dataset 1, the PSO model achieved 96.7% testing accuracy, an absolute improvement of 2.6% over the best-performing traditional classifier (Bagging classifier at 94.1%), while maintaining exceptional sensitivity (99.0%) and specificity (94.6%). Results were even more significant for Dataset 2, where the PSO model reached 98.9% final accuracy, a 3.9% improvement over the LGBM classifier (95.0%), with near-perfect discriminative capability (AUC = 0.999). These performance gains were achieved with reasonable computational overhead, averaging 250.93 s training time for Dataset 2, suggesting the practical viability of PSO optimization for clinical prediction tasks. Our findings underscore the potential of intelligent optimization techniques in developing practical decision support systems for early neurodegenerative disease detection, with significant implications for clinical practice.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3135"},"PeriodicalIF":2.5,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453757/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Applying the S-O-R model to explore impulsive buying behavior driven by influencers on social commerce websites. 应用S-O-R模型探讨社交商务网站上网红驱动的冲动性购买行为。
IF 2.5 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-11 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.3113
Tanaporn Hongsuchon, Shih-Chih Chen, Asif Khan

In recent years, influencer marketing has gained increasing popularity, with many influencers embedding product information into their content (e.g., videos and articles). When fans encounter these messages, they may make unplanned purchases, resulting in impulse buying behavior, a long-standing issue in marketing research. This study aims to explore the factors that lead to such behavior. Using the Stimulus-Organism-Response (S-O-R) model as a framework, the study investigates how interactions between individuals and influencer content (Stimuli) trigger psychological changes in consumers, namely positive affect, flow state, and emotional attachment (Organism), which in turn lead to impulse buying behavior (Response). The study surveyed fans who had previously purchased products recommended by influencers, collecting 404 valid responses. The findings reveal that: (1) Consumers' psychological changes (positive affect, flow state, and emotional attachment) significantly and positively influence impulse buying behavior. (2) Scarcity, discounted price, review quality, and observational learning also have significant positive effects on impulse buying. (3) Social presence and sense of belonging significantly enhance flow state. (4) Entertainment and informativeness significantly enhance emotional attachment.

近年来,网红营销越来越受欢迎,许多网红将产品信息嵌入到他们的内容中(例如视频和文章)。当粉丝遇到这些信息时,他们可能会做出计划外的购买行为,导致冲动购买行为,这是营销研究中一个长期存在的问题。本研究旨在探讨导致这种行为的因素。本研究以刺激-机体-反应(S-O-R)模型为框架,探讨了个体与网红内容(刺激)之间的相互作用如何引发消费者的心理变化,即积极影响、心流状态和情感依恋(机体),进而导致冲动购买行为(反应)。这项研究调查了之前购买过网红推荐产品的粉丝,收集了404份有效回复。研究发现:(1)消费者的心理变化(积极情感、心流状态和情感依恋)显著正向影响冲动购买行为。(2)稀缺性、折扣价格、评论质量和观察学习对冲动购买也有显著的正向影响。(3)社会在场感和归属感显著增强心流状态。(4)娱乐性和信息性显著增强情感依恋。
{"title":"Applying the S-O-R model to explore impulsive buying behavior driven by influencers on social commerce websites.","authors":"Tanaporn Hongsuchon, Shih-Chih Chen, Asif Khan","doi":"10.7717/peerj-cs.3113","DOIUrl":"10.7717/peerj-cs.3113","url":null,"abstract":"<p><p>In recent years, influencer marketing has gained increasing popularity, with many influencers embedding product information into their content (<i>e.g</i>., videos and articles). When fans encounter these messages, they may make unplanned purchases, resulting in impulse buying behavior, a long-standing issue in marketing research. This study aims to explore the factors that lead to such behavior. Using the Stimulus-Organism-Response (S-O-R) model as a framework, the study investigates how interactions between individuals and influencer content (Stimuli) trigger psychological changes in consumers, namely positive affect, flow state, and emotional attachment (Organism), which in turn lead to impulse buying behavior (Response). The study surveyed fans who had previously purchased products recommended by influencers, collecting 404 valid responses. The findings reveal that: (1) Consumers' psychological changes (positive affect, flow state, and emotional attachment) significantly and positively influence impulse buying behavior. (2) Scarcity, discounted price, review quality, and observational learning also have significant positive effects on impulse buying. (3) Social presence and sense of belonging significantly enhance flow state. (4) Entertainment and informativeness significantly enhance emotional attachment.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3113"},"PeriodicalIF":2.5,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453796/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A hybrid deep learning approach with progressive cyclical CNN and firebug swarm optimization for breast cancer detection. 基于渐进式周期CNN和firebug群优化的混合深度学习方法用于乳腺癌检测。
IF 2.5 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-11 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.3119
Sudha Prathyusha Jakkaladiki, Filip Malý

The practice of diagnosing breast cancer retains its scope for improvement in medical imaging, where every correct and timely diagnosis enhances the survival rate of patients. This article presents an integrated approach utilizing patch-wise breast image segmentation, hybrid deep feature extraction, followed by progressive cyclical convolutional neural networks (P-CycCNN), and firebug swarm optimization (FSO) to enhance breast cancer detection. This method first incorporates image segmentation by patches to break down the mammography images into smaller patches, which are easier to focus on and allow for the extraction of more features to boost detection rates. Hybrid feature extraction combines convolutional neural network (CNN) features extracted from pre-trained models with handcrafted features that describe texture and shape, thereby enabling the model to grasp the nuances of both coarse and fine images comprehensively. The progressive cyclical CNN strategy incorporates cyclical, re-adjusted learning rates and a progressive training schedule to accelerate and enhance the model's convergence. FSO is introduced to adjust the hyperparameters of the CNN topology, including the learning rate and regularisation parameters, thereby enhancing training and feature-fusion processes. Evaluated on the Curated Breast Imaging Subset of the Digital Database for Screening Mammography (CBIS-DDSM) dataset, the proposed model achieved 98% test accuracy, 95% precision, 97.2% recall, 96% F1-score, and an AUC of 0.95, outperforming baseline CNN models by 4%-6% across key metrics. This approach holds great potential for enhancing detection systems in clinics, allowing earlier and more accurate detection of malignant lesions.

诊断乳腺癌的做法在医学成像方面仍有改进的余地,每一次正确和及时的诊断都能提高患者的存活率。本文提出了一种综合方法,利用斑块式乳房图像分割,混合深度特征提取,然后是渐进式循环卷积神经网络(P-CycCNN)和火虫群优化(FSO)来增强乳腺癌检测。该方法首先采用图像分割方法,将乳房x线摄影图像分解成更小的图像块,这样更容易集中注意力,并允许提取更多特征以提高检测率。混合特征提取将从预训练模型中提取的卷积神经网络(CNN)特征与描述纹理和形状的手工特征相结合,从而使模型能够全面地掌握粗糙和精细图像的细微差别。渐进式周期CNN策略结合了周期性的、重新调整的学习率和渐进式训练计划,以加速和增强模型的收敛性。引入FSO来调整CNN拓扑的超参数,包括学习率和正则化参数,从而增强训练和特征融合过程。在筛查乳房x线摄影数字数据库(CBIS-DDSM)数据集的策展乳腺成像子集上进行评估,所提出的模型达到98%的测试准确度,95%的精度,97.2%的召回率,96%的f1得分和0.95的AUC,在关键指标上优于基线CNN模型4%-6%。这种方法在加强临床检测系统方面具有巨大的潜力,可以更早、更准确地检测恶性病变。
{"title":"A hybrid deep learning approach with progressive cyclical CNN and firebug swarm optimization for breast cancer detection.","authors":"Sudha Prathyusha Jakkaladiki, Filip Malý","doi":"10.7717/peerj-cs.3119","DOIUrl":"10.7717/peerj-cs.3119","url":null,"abstract":"<p><p>The practice of diagnosing breast cancer retains its scope for improvement in medical imaging, where every correct and timely diagnosis enhances the survival rate of patients. This article presents an integrated approach utilizing patch-wise breast image segmentation, hybrid deep feature extraction, followed by progressive cyclical convolutional neural networks (P-CycCNN), and firebug swarm optimization (FSO) to enhance breast cancer detection. This method first incorporates image segmentation by patches to break down the mammography images into smaller patches, which are easier to focus on and allow for the extraction of more features to boost detection rates. Hybrid feature extraction combines convolutional neural network (CNN) features extracted from pre-trained models with handcrafted features that describe texture and shape, thereby enabling the model to grasp the nuances of both coarse and fine images comprehensively. The progressive cyclical CNN strategy incorporates cyclical, re-adjusted learning rates and a progressive training schedule to accelerate and enhance the model's convergence. FSO is introduced to adjust the hyperparameters of the CNN topology, including the learning rate and regularisation parameters, thereby enhancing training and feature-fusion processes. Evaluated on the Curated Breast Imaging Subset of the Digital Database for Screening Mammography (CBIS-DDSM) dataset, the proposed model achieved 98% test accuracy, 95% precision, 97.2% recall, 96% F1-score, and an AUC of 0.95, outperforming baseline CNN models by 4%-6% across key metrics. This approach holds great potential for enhancing detection systems in clinics, allowing earlier and more accurate detection of malignant lesions.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3119"},"PeriodicalIF":2.5,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453785/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unveiling the capabilities of vision transformers in sperm morphology analysis: a comparative evaluation. 揭示视觉变形在精子形态分析中的能力:比较评价。
IF 2.5 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-09-10 eCollection Date: 2025-01-01 DOI: 10.7717/peerj-cs.3173
Abdulsamet Aktas, Gorkem Serbes, Hamza Osman Ilhan

Traditional sperm morphology assessment relies on manual visual inspection or semi-automated computer-aided sperm analysis (CASA) systems, which often require labor-intensive pre-processing steps. While recent machine learning approaches, particularly convolutional neural networks (CNNs), have improved feature extraction from sperm images, achieving a fully automated and highly accurate system remains challenging due to the complexity of sperm morphology and the need for specialized image adjustments. This study presents a novel, end-to-end automated sperm morphology analysis framework based on vision transformers (ViTs), which processes raw sperm images from two benchmark datasets-Human Sperm Head Morphology (HuSHeM) and Sperm Morphology Image Data Set (SMIDS)-without manual pre-processing. We conducted an extensive hyperparameter optimization study across eight ViT variants, evaluating learning rates, optimization algorithms, and data augmentation scales. Our experiments demonstrated that data augmentation significantly enhances ViT performance by improving generalization, particularly in limited-data scenarios. A comparative analysis of CNNs, hybrid models, and pure ViTs revealed that transformer-based architectures consistently outperform traditional methods. The BEiT_Base model achieved state-of-the-art accuracies of 92.5% (SMIDS) and 93.52% (HuSHeM), surpassing prior CNN-based approaches by 1.63% and 1.42%, respectively. Statistical significance (p < 0.05, t-test) confirmed these improvements. Visualization techniques (Attention Maps, Grad-CAM) further validated ViTs' superior ability to capture long-range spatial dependencies and discriminative morphological features, such as head shape and tail integrity. Our work bridges a critical gap in reproductive medicine by delivering a scalable, fully automated solution that eliminates manual intervention while improving diagnostic accuracy. These findings underscore the potential of transformer-based models in clinical andrology, with implications for broader applications in biomedical image analysis.

传统的精子形态评估依赖于人工目视检查或半自动计算机辅助精子分析(CASA)系统,这通常需要劳动密集型的预处理步骤。虽然最近的机器学习方法,特别是卷积神经网络(cnn),已经改进了精子图像的特征提取,但由于精子形态的复杂性和需要专门的图像调整,实现完全自动化和高度精确的系统仍然具有挑战性。本研究提出了一种基于视觉变压器(ViTs)的新颖的端到端自动精子形态分析框架,该框架处理来自两个基准数据集的原始精子图像-人类精子头部形态(HuSHeM)和精子形态图像数据集(SMIDS)-无需手动预处理。我们对八个ViT变量进行了广泛的超参数优化研究,评估了学习率、优化算法和数据增强规模。我们的实验表明,数据增强通过提高泛化,特别是在有限数据场景下,显著提高了ViT性能。cnn、混合模型和纯vit的对比分析表明,基于变压器的架构始终优于传统方法。BEiT_Base模型达到了92.5% (SMIDS)和93.52% (HuSHeM)的最先进精度,分别比之前基于cnn的方法高1.63%和1.42%。统计学意义(p < 0.05, t检验)证实了这些改善。可视化技术(注意地图,gradcam)进一步验证了ViTs在捕获远程空间依赖性和区分形态特征(如头形状和尾巴完整性)方面的卓越能力。我们的工作通过提供可扩展的全自动解决方案,消除人工干预,同时提高诊断准确性,弥合了生殖医学领域的关键差距。这些发现强调了基于变压器的模型在临床男科中的潜力,并对生物医学图像分析的更广泛应用产生了影响。
{"title":"Unveiling the capabilities of vision transformers in sperm morphology analysis: a comparative evaluation.","authors":"Abdulsamet Aktas, Gorkem Serbes, Hamza Osman Ilhan","doi":"10.7717/peerj-cs.3173","DOIUrl":"10.7717/peerj-cs.3173","url":null,"abstract":"<p><p>Traditional sperm morphology assessment relies on manual visual inspection or semi-automated computer-aided sperm analysis (CASA) systems, which often require labor-intensive pre-processing steps. While recent machine learning approaches, particularly convolutional neural networks (CNNs), have improved feature extraction from sperm images, achieving a fully automated and highly accurate system remains challenging due to the complexity of sperm morphology and the need for specialized image adjustments. This study presents a novel, end-to-end automated sperm morphology analysis framework based on vision transformers (ViTs), which processes raw sperm images from two benchmark datasets-Human Sperm Head Morphology (HuSHeM) and Sperm Morphology Image Data Set (SMIDS)-without manual pre-processing. We conducted an extensive hyperparameter optimization study across eight ViT variants, evaluating learning rates, optimization algorithms, and data augmentation scales. Our experiments demonstrated that data augmentation significantly enhances ViT performance by improving generalization, particularly in limited-data scenarios. A comparative analysis of CNNs, hybrid models, and pure ViTs revealed that transformer-based architectures consistently outperform traditional methods. The BEiT_Base model achieved state-of-the-art accuracies of 92.5% (SMIDS) and 93.52% (HuSHeM), surpassing prior CNN-based approaches by 1.63% and 1.42%, respectively. Statistical significance (<i>p</i> < 0.05, <i>t</i>-test) confirmed these improvements. Visualization techniques (Attention Maps, Grad-CAM) further validated ViTs' superior ability to capture long-range spatial dependencies and discriminative morphological features, such as head shape and tail integrity. Our work bridges a critical gap in reproductive medicine by delivering a scalable, fully automated solution that eliminates manual intervention while improving diagnostic accuracy. These findings underscore the potential of transformer-based models in clinical andrology, with implications for broader applications in biomedical image analysis.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3173"},"PeriodicalIF":2.5,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453802/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145132677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
PeerJ Computer Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1