PeerJ Computer Science最新文献_第3页

Kernel random forest with black hole optimization for heart diseases prediction using data fusion. 核随机森林与黑洞优化的数据融合心脏病预测。

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science

Pub Date : 2024-11-29 eCollection Date: 2024-01-01 DOI: 10.7717/peerj-cs.2364

Ala Saleh Alluhaidan, Mashael Maashi, Noha Negm, Shoayee Dlaim Alotaibi, Ibrahim R Alzahrani, Ahmed S Salama

In recent years, the Internet of Things has played a dominant role in various real-time problems and given solutions via sensor signals. Monitoring the patient health status of Internet of Medical Things (IoMT) facilitates communication between wearable sensor devices and patients through a wireless network. Heart illness is one of the reasons for the increasing death rate in the world. Diagnosing the disease is done by the fusion of multi-sensor device signals. Much research has been done in predicting the disease and treating it correctly. However, the issues are accuracy, consumption time, and inefficiency. To overcome these issues, this paper proposed an efficient algorithm for fusing the multi-sensor signals from wearable sensor devices, classifying the medical signal data and predicting heart disease using the hybrid technique of kernel random forest with the Black Hole Optimization algorithm (KRF-BHO). This KRF-BHO is used for sensor data fusion, while XG-Boost is used to classify echocardiogram images. Accuracy in the training phase with multi-sensor data fusion data set of proposed work KRF-BHO with XGBoost classifier is 94.12%; in the testing phase, the accuracy rate is 95.89%. Similarly, for the Cleveland Dataset, the proposed work KRF-BHO with XGBoost classifier is 95.78%; in the testing phase, the accuracy rate is 96.21%.

近年来，物联网在各种实时问题中发挥了主导作用，并通过传感器信号给出了解决方案。通过医疗物联网（Internet of Medical Things, IoMT）监测患者的健康状况，方便可穿戴传感器设备与患者通过无线网络进行通信。心脏病是世界上死亡率不断上升的原因之一。疾病的诊断是通过多传感器设备信号的融合来完成的。在预测疾病和正确治疗方面已经做了很多研究。然而，问题是准确性、消耗时间和效率低下。为了克服这些问题，本文提出了一种基于核随机森林与黑洞优化算法（KRF-BHO）的混合技术，对可穿戴传感器设备的多传感器信号进行融合，对医疗信号数据进行分类，并进行心脏病预测的高效算法。KRF-BHO用于传感器数据融合，XG-Boost用于超声心动图图像分类。基于XGBoost分类器的KRF-BHO在训练阶段与多传感器数据融合数据集的准确率为94.12%；在测试阶段，准确率为95.89%。同样，对于Cleveland数据集，使用XGBoost分类器提出的工作KRF-BHO为95.78%；在测试阶段，准确率为96.21%。

{"title":"Kernel random forest with black hole optimization for heart diseases prediction using data fusion.","authors":"Ala Saleh Alluhaidan, Mashael Maashi, Noha Negm, Shoayee Dlaim Alotaibi, Ibrahim R Alzahrani, Ahmed S Salama","doi":"10.7717/peerj-cs.2364","DOIUrl":"10.7717/peerj-cs.2364","url":null,"abstract":"In recent years, the Internet of Things has played a dominant role in various real-time problems and given solutions via sensor signals. Monitoring the patient health status of Internet of Medical Things (IoMT) facilitates communication between wearable sensor devices and patients through a wireless network. Heart illness is one of the reasons for the increasing death rate in the world. Diagnosing the disease is done by the fusion of multi-sensor device signals. Much research has been done in predicting the disease and treating it correctly. However, the issues are accuracy, consumption time, and inefficiency. To overcome these issues, this paper proposed an efficient algorithm for fusing the multi-sensor signals from wearable sensor devices, classifying the medical signal data and predicting heart disease using the hybrid technique of kernel random forest with the Black Hole Optimization algorithm (KRF-BHO). This KRF-BHO is used for sensor data fusion, while XG-Boost is used to classify echocardiogram images. Accuracy in the training phase with multi-sensor data fusion data set of proposed work KRF-BHO with XGBoost classifier is 94.12%; in the testing phase, the accuracy rate is 95.89%. Similarly, for the Cleveland Dataset, the proposed work KRF-BHO with XGBoost classifier is 95.78%; in the testing phase, the accuracy rate is 96.21%.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2364"},"PeriodicalIF":3.5,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11622926/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Alzheimer's disease classification through split federated learning and GANs for imbalanced datasets. 基于分裂联邦学习和gan的非平衡数据集增强阿尔茨海默病分类。

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science

Pub Date : 2024-11-29 eCollection Date: 2024-01-01 DOI: 10.7717/peerj-cs.2459

G Narayanee Nimeshika, Subitha D

In the rapidly evolving healthcare sector, using advanced technologies to improve medical classification systems has become crucial for enhancing patient care, diagnosis, and treatment planning. There are two main challenges faced in this domain (i) imbalanced distribution of medical data, leading to biased model performance and (ii) the need to preserve patient privacy and comply with data protection regulations. The primary goal of this project is to develop a medical classification model for Alzheimer's disease detection that can effectively learn from decentralized and imbalanced datasets without compromising on data privacy. The proposed system aims to address these challenges by employing an approach that combines split federated learning (SFL) with conditional generative adversarial networks (cGANs) to enhance medical classification models. SFL enables efficient set of distributed agents that collaboratively train learning models without sharing their data, thus improving data privacy and the integration of conditional GANs aims to improve the model's ability to generalize across imbalanced classes by generating realistic synthetic samples for minority classes. The proposed system provided an accuracy of approximately 83.54 percentage for the Alzheimer's disease classification dataset.

在快速发展的医疗保健领域，使用先进技术改进医疗分类系统对于加强患者护理、诊断和治疗计划至关重要。该领域面临两个主要挑战：(i)医疗数据分布不平衡，导致模型性能存在偏差；（ii）需要保护患者隐私并遵守数据保护法规。该项目的主要目标是开发一种用于阿尔茨海默病检测的医学分类模型，该模型可以有效地从分散和不平衡的数据集中学习，同时不损害数据隐私。该系统旨在通过采用分离联邦学习（SFL）和条件生成对抗网络（cgan）相结合的方法来解决这些挑战，以增强医学分类模型。SFL使一组高效的分布式代理能够在不共享数据的情况下协同训练学习模型，从而提高数据隐私性，条件gan的集成旨在通过为少数类生成真实的合成样本来提高模型跨不平衡类的泛化能力。该系统为阿尔茨海默病分类数据集提供了大约83.54%的准确率。

{"title":"Enhancing Alzheimer's disease classification through split federated learning and GANs for imbalanced datasets.","authors":"G Narayanee Nimeshika, Subitha D","doi":"10.7717/peerj-cs.2459","DOIUrl":"10.7717/peerj-cs.2459","url":null,"abstract":"In the rapidly evolving healthcare sector, using advanced technologies to improve medical classification systems has become crucial for enhancing patient care, diagnosis, and treatment planning. There are two main challenges faced in this domain (i) imbalanced distribution of medical data, leading to biased model performance and (ii) the need to preserve patient privacy and comply with data protection regulations. The primary goal of this project is to develop a medical classification model for Alzheimer's disease detection that can effectively learn from decentralized and imbalanced datasets without compromising on data privacy. The proposed system aims to address these challenges by employing an approach that combines split federated learning (SFL) with conditional generative adversarial networks (cGANs) to enhance medical classification models. SFL enables efficient set of distributed agents that collaboratively train learning models without sharing their data, thus improving data privacy and the integration of conditional GANs aims to improve the model's ability to generalize across imbalanced classes by generating realistic synthetic samples for minority classes. The proposed system provided an accuracy of approximately 83.54 percentage for the Alzheimer's disease classification dataset.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2459"},"PeriodicalIF":3.5,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623002/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An innovative artificial neural network model for smart crop prediction using sensory network based soil data. 基于土壤数据的传感网络智能作物预测的创新人工神经网络模型。

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science

Pub Date : 2024-11-29 eCollection Date: 2024-01-01 DOI: 10.7717/peerj-cs.2478

Shabana Ramzan, Basharat Ali, Ali Raza, Ibrar Hussain, Norma Latif Fitriyani, Yeonghyeon Gu, Muhammad Syafrudin

A thriving agricultural system is the cornerstone of an expanding economy of agricultural countries. Farmers' crop productivity is significantly reduced when they choose the crop without considering environmental factors and soil characteristics. Crop prediction enables farmers to select crops that maximize crop yield and earnings. Accurate crop prediction is mainly concerned with agricultural research, which plays a major role in selecting accurate crops based on environmental factors and soil characteristics. Recently, recommender systems (RS) have gained much attention and are being utilized in various fields such as e-commerce, music, health, text, movies etc. Machine learning techniques can help predict the crop accurately. We proposed an innovative artificial neural network (ANN) based crop prediction system (CPS) to address the farmer's issue. The parameters considered during sensor-based soil data collection for this study are nitrogen, phosphorus, potassium, temperature, humidity, pH, rainfall, electrical conductivity, and soil texture. Python programming language is used to design and validate the proposed system. The accuracy and reliability of the proposed CPS are assessed by using accuracy, precision, recall, and F1-score. We also optimized the proposed CPS by performing a hyperparameter Optimization analysis of applied learning methods. The proposed CPS model accuracy for both real-time collected and state-of-the-art datasets is 99%. The experimental results show that our proposed solution assists farmers in selecting the accurate crop and producing at their best, increasing their profit.

繁荣的农业系统是农业国家经济发展的基石。农民在选择作物时不考虑环境因素和土壤特性，会显著降低作物生产力。作物预测使农民能够选择产量和收入最大化的作物。作物准确预测主要与农业研究有关，它在根据环境因素和土壤特征选择准确作物方面起着重要作用。近年来，推荐系统（RS）受到了广泛的关注，并被应用于电子商务、音乐、健康、文字、电影等各个领域。机器学习技术可以帮助准确预测收成。我们提出了一种创新的基于人工神经网络（ANN）的作物预测系统（CPS）来解决农民的问题。在本研究中，基于传感器的土壤数据收集过程中考虑的参数是氮、磷、钾、温度、湿度、pH、降雨量、电导率和土壤质地。采用Python编程语言对系统进行设计和验证。采用正确率、精密度、查全率和f1评分来评估建议CPS的准确性和可靠性。我们还通过对应用学习方法进行超参数优化分析来优化所提出的CPS。对于实时收集的和最先进的数据集，建议的CPS模型精度为99%。实验结果表明，本文提出的解决方案可以帮助农民准确地选择作物并达到最佳产量，从而提高农民的利润。

{"title":"An innovative artificial neural network model for smart crop prediction using sensory network based soil data.","authors":"Shabana Ramzan, Basharat Ali, Ali Raza, Ibrar Hussain, Norma Latif Fitriyani, Yeonghyeon Gu, Muhammad Syafrudin","doi":"10.7717/peerj-cs.2478","DOIUrl":"10.7717/peerj-cs.2478","url":null,"abstract":"A thriving agricultural system is the cornerstone of an expanding economy of agricultural countries. Farmers' crop productivity is significantly reduced when they choose the crop without considering environmental factors and soil characteristics. Crop prediction enables farmers to select crops that maximize crop yield and earnings. Accurate crop prediction is mainly concerned with agricultural research, which plays a major role in selecting accurate crops based on environmental factors and soil characteristics. Recently, recommender systems (RS) have gained much attention and are being utilized in various fields such as e-commerce, music, health, text, movies etc. Machine learning techniques can help predict the crop accurately. We proposed an innovative artificial neural network (ANN) based crop prediction system (CPS) to address the farmer's issue. The parameters considered during sensor-based soil data collection for this study are nitrogen, phosphorus, potassium, temperature, humidity, pH, rainfall, electrical conductivity, and soil texture. Python programming language is used to design and validate the proposed system. The accuracy and reliability of the proposed CPS are assessed by using accuracy, precision, recall, and F1-score. We also optimized the proposed CPS by performing a hyperparameter Optimization analysis of applied learning methods. The proposed CPS model accuracy for both real-time collected and state-of-the-art datasets is 99%. The experimental results show that our proposed solution assists farmers in selecting the accurate crop and producing at their best, increasing their profit.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2478"},"PeriodicalIF":3.5,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623066/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A simplified approach for efficiency analysis of machine learning algorithms. 机器学习算法效率分析的一种简化方法。

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science

Pub Date : 2024-11-28 eCollection Date: 2024-01-01 DOI: 10.7717/peerj-cs.2418

Muthuramalingam Sivakumar, Sudhaman Parthasarathy, Thiyagarajan Padmapriya

The efficiency of machine learning (ML) algorithms plays a critical role in their deployment across various applications, particularly those with resource constraints or real-time requirements. This article presents a comprehensive framework for evaluating ML algorithm efficiency by incorporating metrics, such as training time, prediction time, memory usage, and computational resource utilization. The proposed methodology involves a multistep process: collecting raw metrics, normalizing them, applying the Analytic Hierarchy Process (AHP) to determine weights, and computing a composite efficiency score. We applied this framework to two distinct datasets: medical image data and agricultural crop prediction data. The results demonstrate that our approach effectively differentiates algorithm performance based on the specific demands of each application. For medical image analysis, the framework highlights strengths in robustness and adaptability, whereas for agricultural crop prediction, it emphasizes scalability and resource management. This study provides valuable insights into optimizing ML algorithms, and offers a versatile tool for practitioners to assess and enhance algorithmic efficiency across diverse domains.

机器学习（ML）算法的效率在各种应用程序的部署中起着至关重要的作用，特别是那些有资源限制或实时要求的应用程序。本文提出了一个综合框架，通过结合指标来评估机器学习算法的效率，如训练时间、预测时间、内存使用和计算资源利用率。提出的方法包括一个多步骤的过程：收集原始指标，将其规范化，应用层次分析法（AHP）来确定权重，并计算综合效率分数。我们将此框架应用于两个不同的数据集：医学图像数据和农业作物预测数据。结果表明，我们的方法可以有效地根据每个应用的特定需求区分算法的性能。对于医学图像分析，该框架强调鲁棒性和适应性，而对于农作物预测，它强调可扩展性和资源管理。本研究为优化机器学习算法提供了有价值的见解，并为从业者提供了一个通用的工具来评估和提高不同领域的算法效率。

{"title":"A simplified approach for efficiency analysis of machine learning algorithms.","authors":"Muthuramalingam Sivakumar, Sudhaman Parthasarathy, Thiyagarajan Padmapriya","doi":"10.7717/peerj-cs.2418","DOIUrl":"10.7717/peerj-cs.2418","url":null,"abstract":"The efficiency of machine learning (ML) algorithms plays a critical role in their deployment across various applications, particularly those with resource constraints or real-time requirements. This article presents a comprehensive framework for evaluating ML algorithm efficiency by incorporating metrics, such as training time, prediction time, memory usage, and computational resource utilization. The proposed methodology involves a multistep process: collecting raw metrics, normalizing them, applying the Analytic Hierarchy Process (AHP) to determine weights, and computing a composite efficiency score. We applied this framework to two distinct datasets: medical image data and agricultural crop prediction data. The results demonstrate that our approach effectively differentiates algorithm performance based on the specific demands of each application. For medical image analysis, the framework highlights strengths in robustness and adaptability, whereas for agricultural crop prediction, it emphasizes scalability and resource management. This study provides valuable insights into optimizing ML algorithms, and offers a versatile tool for practitioners to assess and enhance algorithmic efficiency across diverse domains.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2418"},"PeriodicalIF":3.5,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623197/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Normalized group activations based feature extraction technique using heterogeneous data for Alzheimer's disease classification. 基于归一化组激活的异构数据特征提取技术用于阿尔茨海默病分类。

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science

Pub Date : 2024-11-28 eCollection Date: 2024-01-01 DOI: 10.7717/peerj-cs.2502

Krishnakumar Vaithianathan, Julian Benadit Pernabas, Latha Parthiban, Mamoon Rashid, Sultan S Alshamrani

Several deep learning networks are developed to identify the complex atrophic patterns of Alzheimer's disease (AD). Among various activation functions used in deep neural networks, the rectifier linear unit is the most used one. Even though these functions are analyzed individually, group activations and their interpretations are still not explored for neuroimaging analysis. In this study, a unique feature extraction technique based on normalized group activations that can be applied to both structural MRI and resting-state-fMRI (rs-fMRI) is proposed. This method is split into two phases: multi-trait condensed feature extraction networks and regional association networks. The initial phase involves extracting features from various brain regions using different multi-layered convolutional networks. Then, multiple regional association networks with normalized group activations for all the regional pairs are trained and the output of these networks is given as input to a classifier. To provide an unbiased estimate, an automated diagnosis system equipped with the proposed feature extraction is designed and analyzed on multi-cohort Alzheimer's Disease Neuroimaging Initiative (ADNI) data to predict multi-stages of AD. This system is also trained/tested on heterogeneous features such as non-transformed features, curvelets, wavelets, shearlets, textures, and scattering operators. Baseline scans of 185 rs-fMRIs and 1442 MRIs from ADNI-1, ADNI-2, and ADNI-GO datasets are used for validation. For MCI (mild cognitive impairment) classifications, there is an increase of 1-4% in performance. The outcome demonstrates the good discriminatory behaviour of the proposed features and its efficiency on rs-fMRI time-series and MRI data to classify multiple stages of AD.

开发了几个深度学习网络来识别阿尔茨海默病（AD）的复杂萎缩模式。在深度神经网络中使用的各种激活函数中，整流线性单元是应用最多的一种。尽管这些功能被单独分析，但群体激活及其解释仍未被用于神经影像学分析。在这项研究中，提出了一种独特的基于归一化组激活的特征提取技术，可以应用于结构MRI和静息状态fmri （rs-fMRI）。该方法分为两个阶段：多特征浓缩特征提取网络和区域关联网络。初始阶段包括使用不同的多层卷积网络从不同的大脑区域提取特征。然后，对所有区域对具有归一化组激活的多个区域关联网络进行训练，并将这些网络的输出作为分类器的输入。为了提供无偏估计，设计了一个配备了所提出的特征提取的自动诊断系统，并对多队列阿尔茨海默病神经成像倡议（ADNI）数据进行了分析，以预测AD的多阶段。该系统还对非变换特征、曲线、小波、shearlet、纹理和散射算子等异构特征进行了训练/测试。来自ADNI-1、ADNI-2和ADNI-GO数据集的185个rs- fmri和1442个mri基线扫描用于验证。对于轻度认知障碍（MCI）分类，表现提高了1-4%。结果表明，所提出的特征具有良好的区分行为，并且在rs-fMRI时间序列和MRI数据上有效地对AD的多个阶段进行分类。

{"title":"Normalized group activations based feature extraction technique using heterogeneous data for Alzheimer's disease classification.","authors":"Krishnakumar Vaithianathan, Julian Benadit Pernabas, Latha Parthiban, Mamoon Rashid, Sultan S Alshamrani","doi":"10.7717/peerj-cs.2502","DOIUrl":"10.7717/peerj-cs.2502","url":null,"abstract":"Several deep learning networks are developed to identify the complex atrophic patterns of Alzheimer's disease (AD). Among various activation functions used in deep neural networks, the rectifier linear unit is the most used one. Even though these functions are analyzed individually, group activations and their interpretations are still not explored for neuroimaging analysis. In this study, a unique feature extraction technique based on normalized group activations that can be applied to both structural MRI and resting-state-fMRI (rs-fMRI) is proposed. This method is split into two phases: multi-trait condensed feature extraction networks and regional association networks. The initial phase involves extracting features from various brain regions using different multi-layered convolutional networks. Then, multiple regional association networks with normalized group activations for all the regional pairs are trained and the output of these networks is given as input to a classifier. To provide an unbiased estimate, an automated diagnosis system equipped with the proposed feature extraction is designed and analyzed on multi-cohort Alzheimer's Disease Neuroimaging Initiative (ADNI) data to predict multi-stages of AD. This system is also trained/tested on heterogeneous features such as non-transformed features, curvelets, wavelets, shearlets, textures, and scattering operators. Baseline scans of 185 rs-fMRIs and 1442 MRIs from ADNI-1, ADNI-2, and ADNI-GO datasets are used for validation. For MCI (mild cognitive impairment) classifications, there is an increase of 1-4% in performance. The outcome demonstrates the good discriminatory behaviour of the proposed features and its efficiency on rs-fMRI time-series and MRI data to classify multiple stages of AD.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2502"},"PeriodicalIF":3.5,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11622987/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using large language models for extracting and pre-annotating texts on mental health from noisy data in a low-resource language. 使用大型语言模型从低资源语言的嘈杂数据中提取和预注释有关心理健康的文本。

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science

Pub Date : 2024-11-28 eCollection Date: 2024-01-01 DOI: 10.7717/peerj-cs.2395

Sergei Koltcov, Anton Surkov, Olessia Koltsova, Vera Ignatenko

Recent advancements in large language models (LLMs) have opened new possibilities for developing conversational agents (CAs) in various subfields of mental healthcare. However, this progress is hindered by limited access to high-quality training data, often due to privacy concerns and high annotation costs for low-resource languages. A potential solution is to create human-AI annotation systems that utilize extensive public domain user-to-user and user-to-professional discussions on social media. These discussions, however, are extremely noisy, necessitating the adaptation of LLMs for fully automatic cleaning and pre-classification to reduce human annotation effort. To date, research on LLM-based annotation in the mental health domain is extremely scarce. In this article, we explore the potential of zero-shot classification using four LLMs to select and pre-classify texts into topics representing psychiatric disorders, in order to facilitate the future development of CAs for disorder-specific counseling. We use 64,404 Russian-language texts from online discussion threads labeled with seven most commonly discussed disorders: depression, neurosis, paranoia, anxiety disorder, bipolar disorder, obsessive-compulsive disorder, and borderline personality disorder. Our research shows that while preliminary data filtering using zero-shot technology slightly improves classification, LLM fine-tuning makes a far larger contribution to its quality. Both standard and natural language inference (NLI) modes of fine-tuning increase classification accuracy by more than three times compared to non-fine-tuned training with preliminarily filtered data. Although NLI fine-tuning achieves slightly higher accuracy (0.64) than the standard approach, it is six times slower, indicating a need for further experimentation with NLI hypothesis engineering. Additionally, we demonstrate that lemmatization does not affect classification quality and that multilingual models using texts in their original language perform slightly better than English-only models using automatically translated texts. Finally, we introduce our dataset and model as the first openly available Russian-language resource for developing conversational agents in the domain of mental health counseling.

大型语言模型（llm）的最新进展为在精神卫生保健的各个子领域开发会话代理（ca）开辟了新的可能性。然而，这一进展受到对高质量训练数据的有限访问的阻碍，这通常是由于隐私问题和低资源语言的高注释成本。一个潜在的解决方案是创建人类-人工智能注释系统，利用社交媒体上广泛的公共领域用户对用户和用户对专业人士的讨论。然而，这些讨论非常嘈杂，需要对llm进行适应，以实现全自动清洗和预分类，以减少人工注释工作。迄今为止，基于llm的标注在心理健康领域的研究非常少。在这篇文章中，我们探索了零射击分类的潜力，使用四个法学硕士来选择和预分类文本到代表精神疾病的主题，以促进ca在疾病特异性咨询方面的未来发展。我们使用了来自在线讨论线程的64404个俄语文本，这些文本被标记为七种最常讨论的疾病：抑郁症、神经症、偏执、焦虑症、双相情感障碍、强迫症和边缘型人格障碍。我们的研究表明，虽然使用zero-shot技术的初步数据过滤略微提高了分类，但LLM微调对分类质量的贡献要大得多。与使用初步过滤的数据进行非微调训练相比，标准和自然语言推理（NLI）微调模式的分类准确率都提高了3倍以上。虽然NLI微调达到了比标准方法稍高的精度（0.64），但它要慢6倍，这表明需要进一步的NLI假设工程实验。此外，我们证明了词序化不会影响分类质量，并且使用原始语言文本的多语言模型比使用自动翻译文本的纯英语模型表现略好。最后，我们介绍了我们的数据集和模型，作为在心理健康咨询领域开发会话代理的第一个公开可用的俄语资源。

{"title":"Using large language models for extracting and pre-annotating texts on mental health from noisy data in a low-resource language.","authors":"Sergei Koltcov, Anton Surkov, Olessia Koltsova, Vera Ignatenko","doi":"10.7717/peerj-cs.2395","DOIUrl":"10.7717/peerj-cs.2395","url":null,"abstract":"Recent advancements in large language models (LLMs) have opened new possibilities for developing conversational agents (CAs) in various subfields of mental healthcare. However, this progress is hindered by limited access to high-quality training data, often due to privacy concerns and high annotation costs for low-resource languages. A potential solution is to create human-AI annotation systems that utilize extensive public domain user-to-user and user-to-professional discussions on social media. These discussions, however, are extremely noisy, necessitating the adaptation of LLMs for fully automatic cleaning and pre-classification to reduce human annotation effort. To date, research on LLM-based annotation in the mental health domain is extremely scarce. In this article, we explore the potential of zero-shot classification using four LLMs to select and pre-classify texts into topics representing psychiatric disorders, in order to facilitate the future development of CAs for disorder-specific counseling. We use 64,404 Russian-language texts from online discussion threads labeled with seven most commonly discussed disorders: depression, neurosis, paranoia, anxiety disorder, bipolar disorder, obsessive-compulsive disorder, and borderline personality disorder. Our research shows that while preliminary data filtering using zero-shot technology slightly improves classification, LLM fine-tuning makes a far larger contribution to its quality. Both standard and natural language inference (NLI) modes of fine-tuning increase classification accuracy by more than three times compared to non-fine-tuned training with preliminarily filtered data. Although NLI fine-tuning achieves slightly higher accuracy (0.64) than the standard approach, it is six times slower, indicating a need for further experimentation with NLI hypothesis engineering. Additionally, we demonstrate that lemmatization does not affect classification quality and that multilingual models using texts in their original language perform slightly better than English-only models using automatically translated texts. Finally, we introduce our dataset and model as the first openly available Russian-language resource for developing conversational agents in the domain of mental health counseling.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2395"},"PeriodicalIF":3.5,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623104/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned image. ConBGAT：一种结合卷积神经网络、变压器和图形注意网络的扫描图像信息提取新模型。

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science

Pub Date : 2024-11-28 eCollection Date: 2024-01-01 DOI: 10.7717/peerj-cs.2536

Duy Ho Vo Hoang, Huy Vo Quoc, Bui Thanh Hung

Extracting information from scanned images is a critical task with far-reaching practical implications. Traditional methods often fall short by inadequately leveraging both image and text features, leading to less accurate and efficient outcomes. In this study, we introduce ConBGAT, a cutting-edge model that seamlessly integrates convolutional neural networks (CNNs), Transformers, and graph attention networks to address these shortcomings. Our approach constructs detailed graphs from text regions within images, utilizing advanced Optical Character Recognition to accurately detect and interpret characters. By combining superior extracted features of CNNs for image and Distilled Bidirectional Encoder Representations from Transformers (DistilBERT) for text, our model achieves a comprehensive and efficient data representation. Rigorous testing on real-world datasets shows that ConBGAT significantly outperforms existing methods, demonstrating its superior capability across multiple evaluation metrics. This advancement not only enhances accuracy but also sets a new benchmark for information extraction in scanned image.

从扫描图像中提取信息是一项具有深远实际意义的关键任务。传统的方法往往不能充分利用图像和文本特征，导致结果的准确性和效率降低。在本研究中，我们介绍了ConBGAT，这是一个尖端的模型，它无缝集成了卷积神经网络（cnn）、变形金刚和图注意力网络来解决这些缺点。我们的方法从图像中的文本区域构建详细的图形，利用先进的光学字符识别来准确地检测和解释字符。通过结合cnn对图像的优秀特征提取和transformer （DistilBERT）对文本的双向编码器表示提取，我们的模型实现了全面高效的数据表示。对真实数据集的严格测试表明，ConBGAT显著优于现有方法，证明了其在多个评估指标上的卓越能力。这一进展不仅提高了精度，而且为扫描图像的信息提取树立了新的标杆。

{"title":"ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned image.","authors":"Duy Ho Vo Hoang, Huy Vo Quoc, Bui Thanh Hung","doi":"10.7717/peerj-cs.2536","DOIUrl":"10.7717/peerj-cs.2536","url":null,"abstract":"Extracting information from scanned images is a critical task with far-reaching practical implications. Traditional methods often fall short by inadequately leveraging both image and text features, leading to less accurate and efficient outcomes. In this study, we introduce ConBGAT, a cutting-edge model that seamlessly integrates convolutional neural networks (CNNs), Transformers, and graph attention networks to address these shortcomings. Our approach constructs detailed graphs from text regions within images, utilizing advanced Optical Character Recognition to accurately detect and interpret characters. By combining superior extracted features of CNNs for image and Distilled Bidirectional Encoder Representations from Transformers (DistilBERT) for text, our model achieves a comprehensive and efficient data representation. Rigorous testing on real-world datasets shows that ConBGAT significantly outperforms existing methods, demonstrating its superior capability across multiple evaluation metrics. This advancement not only enhances accuracy but also sets a new benchmark for information extraction in scanned image.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2536"},"PeriodicalIF":3.5,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11622835/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimizing maize germination forecasts with random forest and data fusion techniques. 利用随机森林和数据融合技术优化玉米发芽预测。

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science

Pub Date : 2024-11-28 eCollection Date: 2024-01-01 DOI: 10.7717/peerj-cs.2468

Lili Wu, Yuqing Xing, Kaiwen Yang, Wenqiang Li, Guangyue Ren, Debang Zhang, Huiping Fan

Traditional methods for detecting seed germination rates often involve lengthy experiments that result in damaged seeds. This study selected the Zheng Dan-958 maize variety to predict germination rates using multi-source information fusion and a random forest (RF) algorithm. Images of the seeds and internal cracks were captured with a digital camera. In contrast, the dielectric constant of the seeds was measured using a flat capacitor and converted into voltage readings. Features such as color, shape, texture, crack count, and normalized voltage were used to form feature vectors. Various prediction algorithms, including random forest (RF), radial basis function (RBF), neural networks (NNs), support vector machine (SVM), and extreme learning machine (ELM), were developed and tested against standard germination experiments. The RF model stood out, with a training time of 5.18 s and the highest accuracy of 92.88%, along with a mean absolute error (MAE) of 0.913 and a root mean square error (RMSE) of 1.163. The study concluded that the RF model, combined with multi-source information fusion, offers a feasible and nondestructive method for quickly and accurately predicting maize seed germination rates.

检测种子发芽率的传统方法通常需要进行冗长的实验，结果导致种子受损。以郑单958玉米品种为研究对象，采用多源信息融合和随机森林（RF）算法预测发芽率。种子和内部裂缝的图像是用数码相机拍摄的。相反，种子的介电常数是用扁平电容器测量的，并转换成电压读数。使用颜色、形状、纹理、裂纹数和归一化电压等特征形成特征向量。开发了各种预测算法，包括随机森林（RF）、径向基函数（RBF）、神经网络（nn）、支持向量机（SVM）和极限学习机（ELM），并在标准发芽实验中进行了测试。该模型训练时间为5.18 s，最高准确率为92.88%，平均绝对误差（MAE）为0.913，均方根误差（RMSE）为1.163。研究表明，射频模型结合多源信息融合，为快速准确预测玉米种子发芽率提供了一种可行的、无损的方法。

{"title":"Optimizing maize germination forecasts with random forest and data fusion techniques.","authors":"Lili Wu, Yuqing Xing, Kaiwen Yang, Wenqiang Li, Guangyue Ren, Debang Zhang, Huiping Fan","doi":"10.7717/peerj-cs.2468","DOIUrl":"10.7717/peerj-cs.2468","url":null,"abstract":"Traditional methods for detecting seed germination rates often involve lengthy experiments that result in damaged seeds. This study selected the Zheng Dan-958 maize variety to predict germination rates using multi-source information fusion and a random forest (RF) algorithm. Images of the seeds and internal cracks were captured with a digital camera. In contrast, the dielectric constant of the seeds was measured using a flat capacitor and converted into voltage readings. Features such as color, shape, texture, crack count, and normalized voltage were used to form feature vectors. Various prediction algorithms, including random forest (RF), radial basis function (RBF), neural networks (NNs), support vector machine (SVM), and extreme learning machine (ELM), were developed and tested against standard germination experiments. The RF model stood out, with a training time of 5.18 s and the highest accuracy of 92.88%, along with a mean absolute error (MAE) of 0.913 and a root mean square error (RMSE) of 1.163. The study concluded that the RF model, combined with multi-source information fusion, offers a feasible and nondestructive method for quickly and accurately predicting maize seed germination rates.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2468"},"PeriodicalIF":3.5,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623106/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Infrared and visible image fusion algorithm based on gradient attention residuals dense block. 基于梯度注意残差密集块的红外与可见光图像融合算法。

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science

Pub Date : 2024-11-28 eCollection Date: 2024-01-01 DOI: 10.7717/peerj-cs.2569

Yongyu Luo, Zhongqiang Luo

The purpose of infrared and visible image fusion is to obtain an image that includes both infrared target and visible information. However, among the existing infrared and visible image fusion methods, some of them give priority to the fusion effect, often with complex design, ignoring the influence of attention mechanisms on deep features, resulting in the lack of visible light texture information in the fusion image. To solve these problems, an infrared and visible image fusion method based on dense gradient attention residuals is proposed in this article. Firstly, squeeze-and-excitation networks are integrated into the gradient convolutional dense block, and a new gradient attention residual dense block is designed to enhance the ability of the network to extract important information. In order to retain more original image information, the feature gradient attention module is introduced to enhance the ability of detail information retention. In the fusion layer, an adaptive weighted energy attention network based on an energy fusion strategy is used to further preserve the infrared and visible details. Through the experimental comparison on the TNO dataset, our method has excellent performance on several evaluation indicators. Specifically, in the indexes of average gradient (AG), information entropy (EN), spatial frequency (SF), mutual information (MI) and standard deviation (SD), our method reached 6.90, 7.46, 17.30, 2.62 and 54.99, respectively, which increased by 37.31%, 6.55%, 32.01%, 8.16%, and 10.01% compared with the other five commonly used methods. These results demonstrate the effectiveness and superiority of our method.

红外与可见光图像融合的目的是获得既包含红外目标信息又包含可见光信息的图像。然而，在现有的红外与可见光图像融合方法中，有些方法优先考虑融合效果，往往设计复杂，忽略了注意机制对深层特征的影响，导致融合图像中缺乏可见光纹理信息。为了解决这些问题，本文提出了一种基于密集梯度注意残差的红外与可见光图像融合方法。首先，将挤压激励网络集成到梯度卷积密集块中，设计了一种新的梯度注意残差密集块，增强了网络提取重要信息的能力；为了保留更多的原始图像信息，引入特征梯度关注模块，增强细节信息的保留能力。融合层采用基于能量融合策略的自适应加权能量关注网络，进一步保留红外和可见光细节。通过在TNO数据集上的实验对比，我们的方法在多个评价指标上都有很好的表现。其中，平均梯度（AG）、信息熵（EN）、空间频率（SF）、互信息（MI）和标准差（SD）指标分别达到6.90、7.46、17.30、2.62和54.99，比其他5种常用方法分别提高了37.31%、6.55%、32.01%、8.16%和10.01%。这些结果证明了我们方法的有效性和优越性。

{"title":"Infrared and visible image fusion algorithm based on gradient attention residuals dense block.","authors":"Yongyu Luo, Zhongqiang Luo","doi":"10.7717/peerj-cs.2569","DOIUrl":"10.7717/peerj-cs.2569","url":null,"abstract":"The purpose of infrared and visible image fusion is to obtain an image that includes both infrared target and visible information. However, among the existing infrared and visible image fusion methods, some of them give priority to the fusion effect, often with complex design, ignoring the influence of attention mechanisms on deep features, resulting in the lack of visible light texture information in the fusion image. To solve these problems, an infrared and visible image fusion method based on dense gradient attention residuals is proposed in this article. Firstly, squeeze-and-excitation networks are integrated into the gradient convolutional dense block, and a new gradient attention residual dense block is designed to enhance the ability of the network to extract important information. In order to retain more original image information, the feature gradient attention module is introduced to enhance the ability of detail information retention. In the fusion layer, an adaptive weighted energy attention network based on an energy fusion strategy is used to further preserve the infrared and visible details. Through the experimental comparison on the TNO dataset, our method has excellent performance on several evaluation indicators. Specifically, in the indexes of average gradient (AG), information entropy (EN), spatial frequency (SF), mutual information (MI) and standard deviation (SD), our method reached 6.90, 7.46, 17.30, 2.62 and 54.99, respectively, which increased by 37.31%, 6.55%, 32.01%, 8.16%, and 10.01% compared with the other five commonly used methods. These results demonstrate the effectiveness and superiority of our method.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2569"},"PeriodicalIF":3.5,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11622899/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142802488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A review of organization-oriented phishing research. 面向组织的网络钓鱼研究综述。

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science

Pub Date : 2024-11-27 eCollection Date: 2024-01-01 DOI: 10.7717/peerj-cs.2487

Kholoud Althobaiti, Nawal Alsufyani

The increased sophistication and frequency of phishing attacks that target organizations necessitate a comprehensive cyber security strategy to handle phishing attacks from several perspectives, such as the detection of phishing and testing of users' awareness. Through a systematic review of 163 research articles, we analyzed the organization-oriented phishing research to categorize current research and identify future opportunities. We find that a notable number of studies concentrate on phishing detection and awareness while other layers of protection are overlooked, such as the mitigation of phishing. In addition, we draw attention to shortcomings and challenges. We believe that this article will provide opportunities for future research on phishing in organizations.

针对组织的网络钓鱼攻击越来越复杂和频繁，需要一个全面的网络安全策略来从几个角度处理网络钓鱼攻击，例如检测网络钓鱼和测试用户的意识。通过对163篇研究文章的系统回顾，我们分析了面向组织的网络钓鱼研究，对当前研究进行了分类，并确定了未来的机会。我们发现，大量的研究集中在网络钓鱼检测和意识上，而忽略了其他保护层，例如网络钓鱼的缓解。此外，我们提请注意不足和挑战。我们相信这篇文章将为今后在组织中研究网络钓鱼提供机会。

引用次数: 0