首页 > 最新文献

PeerJ Computer Science最新文献

英文 中文
Schizophrenia diagnosis based on diverse epoch size resting-state EEG using machine learning 利用机器学习,基于不同历元大小的静息态脑电图诊断精神分裂症
IF 3.8 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-20 DOI: 10.7717/peerj-cs.2170
Athar Alazzawı, Saif Aljumaili, Adil Deniz Duru, Osman Nuri Uçan, Oğuz Bayat, Paulo Jorge Coelho, Ivan Miguel Pires
Schizophrenia is a severe mental disorder that impairs a person’s mental, social, and emotional faculties gradually. Detection in the early stages with an accurate diagnosis is crucial to remedying the patients. This study proposed a new method to classify schizophrenia disease in the rest state based on neurologic signals achieved from the brain by electroencephalography (EEG). The datasets used consisted of 28 subjects, 14 for each group, which are schizophrenia and healthy control. The data was collected from the scalps with 19 EEG channels using a 250 Hz frequency. Due to the brain signal variation, we have decomposed the EEG signals into five sub-bands using a band-pass filter, ensuring the best signal clarity and eliminating artifacts. This work was performed with several scenarios: First, traditional techniques were applied. Secondly, augmented data (additive white Gaussian noise and stretched signals) were utilized. Additionally, we assessed Minimum Redundancy Maximum Relevance (MRMR) as the features reduction method. All these data scenarios are applied with three different window sizes (epochs): 1, 2, and 5 s, utilizing six algorithms to extract features: Fast Fourier Transform (FFT), Approximate Entropy (ApEn), Log Energy entropy (LogEn), Shannon Entropy (ShnEn), and kurtosis. The L2-normalization method was applied to the derived features, positively affecting the results. In terms of classification, we applied four algorithms: K-nearest neighbor (KNN), support vector machine (SVM), quadratic discriminant analysis (QDA), and ensemble classifier (EC). From all the scenarios, our evaluation showed that SVM had remarkable results in all evaluation metrics with LogEn features utilizing a 1-s window size, impacting the diagnosis of Schizophrenia disease. This indicates that an accurate diagnosis of schizophrenia can be achieved through the right features and classification model selection. Finally, we contrasted our results to recently published works using the same and a different dataset, where our method showed a notable improvement.
精神分裂症是一种严重的精神障碍,会逐渐损害患者的精神、社交和情感能力。早期发现并做出准确诊断对患者的康复至关重要。本研究提出了一种基于脑电图(EEG)获得的大脑神经信号,在静息状态下对精神分裂症进行分类的新方法。使用的数据集包括 28 名受试者,每组 14 人,分别为精神分裂症患者和健康对照组。数据通过 19 个脑电图通道从头皮采集,频率为 250 赫兹。由于大脑信号的变化,我们使用带通滤波器将脑电信号分解为五个子带,以确保最佳的信号清晰度并消除伪影。这项工作是在几种情况下进行的:首先,应用传统技术。其次,利用增强数据(加性白高斯噪声和拉伸信号)。此外,我们还评估了最小冗余最大相关性(MRMR)作为减少特征的方法。所有这些数据方案都使用了三种不同的窗口大小(历时):1、2 和 5 秒,利用六种算法提取特征:快速傅立叶变换 (FFT)、近似熵 (ApEn)、对数能量熵 (LogEn)、香农熵 (ShnEn) 和峰度。对得出的特征采用了 L2 归一化方法,对结果产生了积极影响。在分类方面,我们采用了四种算法:K近邻(KNN)、支持向量机(SVM)、二次判别分析(QDA)和集合分类器(EC)。我们的评估结果表明,在使用 1 秒窗口大小的 LogEn 特征的所有评估指标中,SVM 都取得了显著的结果,对精神分裂症疾病的诊断产生了影响。这表明,通过选择正确的特征和分类模型,可以实现对精神分裂症的准确诊断。最后,我们将我们的结果与最近发表的使用相同和不同数据集的作品进行了对比,我们的方法在这些数据集上有明显的改进。
{"title":"Schizophrenia diagnosis based on diverse epoch size resting-state EEG using machine learning","authors":"Athar Alazzawı, Saif Aljumaili, Adil Deniz Duru, Osman Nuri Uçan, Oğuz Bayat, Paulo Jorge Coelho, Ivan Miguel Pires","doi":"10.7717/peerj-cs.2170","DOIUrl":"https://doi.org/10.7717/peerj-cs.2170","url":null,"abstract":"Schizophrenia is a severe mental disorder that impairs a person’s mental, social, and emotional faculties gradually. Detection in the early stages with an accurate diagnosis is crucial to remedying the patients. This study proposed a new method to classify schizophrenia disease in the rest state based on neurologic signals achieved from the brain by electroencephalography (EEG). The datasets used consisted of 28 subjects, 14 for each group, which are schizophrenia and healthy control. The data was collected from the scalps with 19 EEG channels using a 250 Hz frequency. Due to the brain signal variation, we have decomposed the EEG signals into five sub-bands using a band-pass filter, ensuring the best signal clarity and eliminating artifacts. This work was performed with several scenarios: First, traditional techniques were applied. Secondly, augmented data (additive white Gaussian noise and stretched signals) were utilized. Additionally, we assessed Minimum Redundancy Maximum Relevance (MRMR) as the features reduction method. All these data scenarios are applied with three different window sizes (epochs): 1, 2, and 5 s, utilizing six algorithms to extract features: Fast Fourier Transform (FFT), Approximate Entropy (ApEn), Log Energy entropy (LogEn), Shannon Entropy (ShnEn), and kurtosis. The L2-normalization method was applied to the derived features, positively affecting the results. In terms of classification, we applied four algorithms: K-nearest neighbor (KNN), support vector machine (SVM), quadratic discriminant analysis (QDA), and ensemble classifier (EC). From all the scenarios, our evaluation showed that SVM had remarkable results in all evaluation metrics with LogEn features utilizing a 1-s window size, impacting the diagnosis of Schizophrenia disease. This indicates that an accurate diagnosis of schizophrenia can be achieved through the right features and classification model selection. Finally, we contrasted our results to recently published works using the same and a different dataset, where our method showed a notable improvement.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"35 1","pages":""},"PeriodicalIF":3.8,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-modal deep learning framework for damage detection in social media posts 社交媒体帖子中的损害检测多模态深度学习框架
IF 3.8 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-20 DOI: 10.7717/peerj-cs.2262
Jiale Zhang, Manyu Liao, Yanping Wang, Yifan Huang, Fuyu Chen, Chiba Makiko
In crisis management, quickly identifying and helping affected individuals is key, especially when there is limited information about the survivors’ conditions. Traditional emergency systems often face issues with reachability and handling large volumes of requests. Social media has become crucial in disaster response, providing important information and aiding in rescues when standard communication systems fail. Due to the large amount of data generated on social media during emergencies, there is a need for automated systems to process this information effectively and help improve emergency responses, potentially saving lives. Therefore, accurately understanding visual scenes and their meanings is important for identifying damage and obtaining useful information. Our research introduces a framework for detecting damage in social media posts, combining the Bidirectional Encoder Representations from Transformers (BERT) architecture with advanced convolutional processing. This framework includes a BERT-based network for analyzing text and multiple convolutional neural network blocks for processing images. The results show that this combination is very effective, outperforming existing methods in accuracy, recall, and F1 score. In the future, this method could be enhanced by including more types of information, such as human voices or background sounds, to improve its prediction efficiency.
在危机管理中,快速识别和帮助受影响的个人是关键,尤其是在有关幸存者状况的信息有限的情况下。传统的应急系统往往面临无法联系和处理大量请求的问题。社交媒体在灾难应对中变得至关重要,它能在标准通信系统失灵时提供重要信息并协助救援。由于在紧急情况下社交媒体会产生大量数据,因此需要自动化系统来有效处理这些信息,帮助改善应急响应,从而挽救生命。因此,准确理解视觉场景及其含义对于识别损害和获取有用信息非常重要。我们的研究引入了一个用于检测社交媒体帖子中损坏情况的框架,将变压器双向编码器表示(BERT)架构与先进的卷积处理相结合。该框架包括用于分析文本的基于 BERT 的网络和用于处理图像的多个卷积神经网络块。结果表明,这种组合非常有效,在准确率、召回率和 F1 分数方面都优于现有方法。未来,这种方法还可以通过加入更多类型的信息(如人声或背景声音)来提高预测效率。
{"title":"Multi-modal deep learning framework for damage detection in social media posts","authors":"Jiale Zhang, Manyu Liao, Yanping Wang, Yifan Huang, Fuyu Chen, Chiba Makiko","doi":"10.7717/peerj-cs.2262","DOIUrl":"https://doi.org/10.7717/peerj-cs.2262","url":null,"abstract":"In crisis management, quickly identifying and helping affected individuals is key, especially when there is limited information about the survivors’ conditions. Traditional emergency systems often face issues with reachability and handling large volumes of requests. Social media has become crucial in disaster response, providing important information and aiding in rescues when standard communication systems fail. Due to the large amount of data generated on social media during emergencies, there is a need for automated systems to process this information effectively and help improve emergency responses, potentially saving lives. Therefore, accurately understanding visual scenes and their meanings is important for identifying damage and obtaining useful information. Our research introduces a framework for detecting damage in social media posts, combining the Bidirectional Encoder Representations from Transformers (BERT) architecture with advanced convolutional processing. This framework includes a BERT-based network for analyzing text and multiple convolutional neural network blocks for processing images. The results show that this combination is very effective, outperforming existing methods in accuracy, recall, and F1 score. In the future, this method could be enhanced by including more types of information, such as human voices or background sounds, to improve its prediction efficiency.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"76 5 1","pages":""},"PeriodicalIF":3.8,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel adaptive weight bi-directional long short-term memory (AWBi-LSTM) classifier model for heart stroke risk level prediction in IoT 用于预测物联网中心脏中风风险水平的新型自适应加权双向长短期记忆(AWBi-LSTM)分类器模型
IF 3.8 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-20 DOI: 10.7717/peerj-cs.2196
S Thumilvannan, R Balamanigandan
Stroke prediction has become one of the significant research areas due to the increasing fatality rate. Hence, this article proposes a novel Adaptive Weight Bi-Directional Long Short-Term Memory (AWBi-LSTM) classifier model for stroke risk level prediction for IoT data. To efficiently train the classifier, Hybrid Genetic removes the missing data with Kmeans Algorithm (HKGA), and the data are aggregated. Then, the features are reduced with independent component analysis (ICA) to reduce the dataset size. After the correlated features are identified using the T-test-based uniform distribution-gradient search rule-based elephant herding optimization for cluster analysis (GSRBEHO) (T-test-UD-GSRBEHO). Next, the fuzzy rule-based decisions are created with the T-test-UDEHOA correlated features to classify the risk levels accurately. The feature values obtained from the fuzzy logic are given to the AWBi-LSTM classifier, which predicts and classifies the risk level of heart disease and diabetes. After the risk level is predicted, the data is securely stored in the database. Here, the MD5-Elliptic Curve Cryptography (MD5-ECC) technique is utilized for secure storage. Testing the suggested risk prediction model on the Stroke prediction dataset reveals potential efficacy. By obtaining an accuracy of 99.6%, the research outcomes demonstrated that the proposed model outperforms the existing techniques.
由于死亡率不断上升,中风预测已成为重要的研究领域之一。因此,本文提出了一种新颖的自适应加权双向长短期记忆(AWBi-LSTM)分类器模型,用于预测物联网数据的中风风险水平。为了高效地训练分类器,混合遗传利用均值算法(HKGA)去除缺失数据,并对数据进行聚合。然后,使用独立分量分析(ICA)减少特征,以减小数据集的大小。之后,使用基于 T 检验的均匀分布-梯度搜索规则的象群优化聚类分析(GSRBEHO)(T-test-UD-GSRBEHO)识别相关特征。然后,利用 T-test-UDEHOA 相关特征创建基于规则的模糊决策,以准确划分风险等级。从模糊逻辑中获得的特征值将交给 AWBi-LSTM 分类器,由其预测和分类心脏病和糖尿病的风险等级。预测出风险等级后,数据会被安全地存储到数据库中。这里使用了 MD5-Elliptic Curve Cryptography(MD5-ECC)技术进行安全存储。在中风预测数据集上对建议的风险预测模型进行测试后,发现该模型具有潜在的功效。研究结果表明,建议模型的准确率达到 99.6%,优于现有技术。
{"title":"A novel adaptive weight bi-directional long short-term memory (AWBi-LSTM) classifier model for heart stroke risk level prediction in IoT","authors":"S Thumilvannan, R Balamanigandan","doi":"10.7717/peerj-cs.2196","DOIUrl":"https://doi.org/10.7717/peerj-cs.2196","url":null,"abstract":"Stroke prediction has become one of the significant research areas due to the increasing fatality rate. Hence, this article proposes a novel Adaptive Weight Bi-Directional Long Short-Term Memory (AWBi-LSTM) classifier model for stroke risk level prediction for IoT data. To efficiently train the classifier, Hybrid Genetic removes the missing data with Kmeans Algorithm (HKGA), and the data are aggregated. Then, the features are reduced with independent component analysis (ICA) to reduce the dataset size. After the correlated features are identified using the T-test-based uniform distribution-gradient search rule-based elephant herding optimization for cluster analysis (GSRBEHO) (T-test-UD-GSRBEHO). Next, the fuzzy rule-based decisions are created with the T-test-UDEHOA correlated features to classify the risk levels accurately. The feature values obtained from the fuzzy logic are given to the AWBi-LSTM classifier, which predicts and classifies the risk level of heart disease and diabetes. After the risk level is predicted, the data is securely stored in the database. Here, the MD5-Elliptic Curve Cryptography (MD5-ECC) technique is utilized for secure storage. Testing the suggested risk prediction model on the Stroke prediction dataset reveals potential efficacy. By obtaining an accuracy of 99.6%, the research outcomes demonstrated that the proposed model outperforms the existing techniques.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"33 1","pages":""},"PeriodicalIF":3.8,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint coordinate attention mechanism and instance normalization for COVID online comments text classification 用于 COVID 在线评论文本分类的联合协调关注机制和实例规范化
IF 3.8 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-19 DOI: 10.7717/peerj-cs.2240
Rong Zhu, Hua-Hui Gao, Yong Wang
Background The majority of extant methodologies for text classification prioritize the extraction of feature representations from texts with high degrees of distinction, a process that may result in computational inefficiencies. To address this limitation, the current study proposes a novel approach by directly leveraging label information to construct text representations. This integration aims to optimize the use of label data alongside textual content. Methods The methodology initiated with separate pre-processing of texts and labels, followed by encoding through a projection layer. This research then utilized a conventional self-attention model enhanced by instance normalization (IN) and Gaussian Error Linear Unit (GELU) functions to assess emotional valences in review texts. An advanced self-attention mechanism was further developed to enable the efficient integration of text and label information. In the final stage, an adaptive label encoder was employed to extract relevant label information from the combined text-label data efficiently. Results Empirical evaluations demonstrate that the proposed model achieves a significant improvement in classification performance, outperforming existing methodologies. This enhancement is quantitatively evidenced by its superior micro-F1 score, indicating the efficacy of integrating label information into text classification processes. This suggests that the model not only addresses computational inefficiencies but also enhances the accuracy of text classification.
背景 大多数现有的文本分类方法都优先考虑从具有高度区分度的文本中提取特征表征,这一过程可能会导致计算效率低下。为了解决这一局限性,本研究提出了一种新方法,即直接利用标签信息来构建文本表征。这种整合旨在优化标签数据与文本内容的使用。方法 该方法首先对文本和标签分别进行预处理,然后通过投影层进行编码。然后,本研究利用实例归一化(IN)和高斯误差线性单元(GELU)函数增强的传统自我关注模型来评估评论文本中的情感价位。研究还进一步开发了先进的自我注意机制,以实现文本和标签信息的有效整合。在最后阶段,采用自适应标签编码器从文本-标签组合数据中有效提取相关标签信息。结果 经验评估表明,所提出的模型显著提高了分类性能,优于现有方法。其卓越的 micro-F1 分数从数量上证明了这一改进,表明将标签信息整合到文本分类过程中是有效的。这表明,该模型不仅解决了计算效率低下的问题,还提高了文本分类的准确性。
{"title":"Joint coordinate attention mechanism and instance normalization for COVID online comments text classification","authors":"Rong Zhu, Hua-Hui Gao, Yong Wang","doi":"10.7717/peerj-cs.2240","DOIUrl":"https://doi.org/10.7717/peerj-cs.2240","url":null,"abstract":"Background The majority of extant methodologies for text classification prioritize the extraction of feature representations from texts with high degrees of distinction, a process that may result in computational inefficiencies. To address this limitation, the current study proposes a novel approach by directly leveraging label information to construct text representations. This integration aims to optimize the use of label data alongside textual content. Methods The methodology initiated with separate pre-processing of texts and labels, followed by encoding through a projection layer. This research then utilized a conventional self-attention model enhanced by instance normalization (IN) and Gaussian Error Linear Unit (GELU) functions to assess emotional valences in review texts. An advanced self-attention mechanism was further developed to enable the efficient integration of text and label information. In the final stage, an adaptive label encoder was employed to extract relevant label information from the combined text-label data efficiently. Results Empirical evaluations demonstrate that the proposed model achieves a significant improvement in classification performance, outperforming existing methodologies. This enhancement is quantitatively evidenced by its superior micro-F1 score, indicating the efficacy of integrating label information into text classification processes. This suggests that the model not only addresses computational inefficiencies but also enhances the accuracy of text classification.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"8 1","pages":""},"PeriodicalIF":3.8,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Live software documentation of design pattern instances 设计模式实例的实时软件文档
IF 3.8 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-16 DOI: 10.7717/peerj-cs.2090
Filipe Lemos, Filipe F. Correia, Ademar Aguiar, Paulo G. G. Queiroz
BackgroundApproaches to documenting the software patterns of a system can support intentionally and manually documenting them or automatically extracting them from the source code. Some of the approaches that we review do not maintain proximity between code and documentation. Others do not update the documentation after the code is changed. All of them present a low level of liveness. ApproachThis work proposes an approach to improve the understandability of a software system by documenting the design patterns it uses. We regard the creation and the documentation of software as part of the same process and attempt to streamline the two activities. We achieve this by increasing the feedback about the pattern instances present in the code, during development—i.e., by increasing liveness. Moreover, our approach maintains proximity between code and documentation and allows us to visualize the pattern instances under the same environment. We developed a prototype—DesignPatternDoc—for IntelliJ IDEA that continuously identifies pattern instances in the code, suggests them to the developer, generates the respective pattern-instance documentation, and enables live editing and visualization of that documentation. ResultsTo evaluate this approach, we conducted a controlled experiment with 21 novice developers. We asked participants to complete three tasks that involved understanding and evolving small software systems—up to six classes and 100 lines of code—and recorded the duration and the number of context switches. The results show that our approach helps developers spend less time understanding and documenting a software system when compared to using tools with a lower degree of liveness. Additionally, embedding documentation in the IDE and maintaining it close to the source code reduces context switching significantly.
背景记录系统软件模式的方法可以支持有意和手动记录这些模式,或自动从源代码中提取这些模式。我们审查过的一些方法不能保持代码与文档之间的接近性。还有一些方法在代码更改后不更新文档。所有这些方法的有效性都很低。方法这项工作提出了一种通过记录软件系统所使用的设计模式来提高系统可理解性的方法。我们将软件的创建和文档记录视为同一过程的一部分,并试图简化这两项活动。为此,我们在开发过程中增加了对代码中存在的模式实例的反馈,也就是增加了有效性。此外,我们的方法还能保持代码和文档之间的接近性,并允许我们在同一环境下对模式实例进行可视化。我们为 IntelliJ IDEA 开发了一个原型--DesignPatternDoc,它可以持续识别代码中的模式实例,向开发人员提出建议,生成相应的模式实例文档,并实现文档的实时编辑和可视化。结果为了评估这种方法,我们对 21 名新手开发人员进行了对照实验。我们要求参与者完成三个任务,这些任务涉及理解和发展小型软件系统(最多六个类和 100 行代码),并记录了持续时间和上下文切换的次数。结果表明,与使用活泼度较低的工具相比,我们的方法可以帮助开发人员减少理解和记录软件系统的时间。此外,在集成开发环境中嵌入文档并将其保持在源代码附近,可以显著减少上下文切换。
{"title":"Live software documentation of design pattern instances","authors":"Filipe Lemos, Filipe F. Correia, Ademar Aguiar, Paulo G. G. Queiroz","doi":"10.7717/peerj-cs.2090","DOIUrl":"https://doi.org/10.7717/peerj-cs.2090","url":null,"abstract":"Background\u0000Approaches to documenting the software patterns of a system can support intentionally and manually documenting them or automatically extracting them from the source code. Some of the approaches that we review do not maintain proximity between code and documentation. Others do not update the documentation after the code is changed. All of them present a low level of liveness. Approach\u0000This work proposes an approach to improve the understandability of a software system by documenting the design patterns it uses. We regard the creation and the documentation of software as part of the same process and attempt to streamline the two activities. We achieve this by increasing the feedback about the pattern instances present in the code, during development—i.e., by increasing liveness. Moreover, our approach maintains proximity between code and documentation and allows us to visualize the pattern instances under the same environment. We developed a prototype—DesignPatternDoc—for IntelliJ IDEA that continuously identifies pattern instances in the code, suggests them to the developer, generates the respective pattern-instance documentation, and enables live editing and visualization of that documentation. Results\u0000To evaluate this approach, we conducted a controlled experiment with 21 novice developers. We asked participants to complete three tasks that involved understanding and evolving small software systems—up to six classes and 100 lines of code—and recorded the duration and the number of context switches. The results show that our approach helps developers spend less time understanding and documenting a software system when compared to using tools with a lower degree of liveness. Additionally, embedding documentation in the IDE and maintaining it close to the source code reduces context switching significantly.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"13 1","pages":""},"PeriodicalIF":3.8,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distilroberta2gnn: a new hybrid deep learning approach for aspect-based sentiment analysis Distilroberta2gnn:基于方面的情感分析的新型混合深度学习方法
IF 3.8 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-16 DOI: 10.7717/peerj-cs.2267
Aseel Alhadlaq, Alaa Altheneyan
In the field of natural language processing (NLP), aspect-based sentiment analysis (ABSA) is crucial for extracting insights from complex human sentiments towards specific text aspects. Despite significant progress, the field still faces challenges such as accurately interpreting subtle language nuances and the scarcity of high-quality, domain-specific annotated datasets. This study introduces the Distil- RoBERTa2GNN model, an innovative hybrid approach that combines the DistilRoBERTa pre-trained model’s feature extraction capabilities with the dynamic sentiment classification abilities of graph neural networks (GNN). Our comprehensive, four-phase data preprocessing strategy is designed to enrich model training with domain-specific, high-quality data. In this study, we analyze four publicly available benchmark datasets: Rest14, Rest15, Rest16-EN, and Rest16-ESP, to rigorously evaluate the effectiveness of our novel DistilRoBERTa2GNN model in ABSA. For the Rest14 dataset, our model achieved an F1 score of 77.98%, precision of 78.12%, and recall of 79.41%. The Rest15 dataset shows that our model achieves an F1 score of 76.86%, precision of 80.70%, and recall of 79.37%. For the Rest16-EN dataset, our model reached an F1 score of 84.96%, precision of 82.77%, and recall of 87.28%. For Rest16-ESP (Spanish dataset), our model achieved an F1 score of 74.87%, with a precision of 73.11% and a recall of 76.80%. These metrics highlight our model’s competitive edge over different baseline models used in ABSA studies. This study addresses critical ABSA challenges and sets a new benchmark for sentiment analysis research, guiding future efforts toward enhancing model adaptability and performance across diverse datasets.
在自然语言处理(NLP)领域,基于方面的情感分析(ABSA)对于从人类对特定文本方面的复杂情感中提取洞察力至关重要。尽管取得了重大进展,但该领域仍面临一些挑战,如准确解释微妙的语言细微差别,以及缺乏高质量、特定领域的注释数据集。本研究介绍了 Distil- RoBERTa2GNN 模型,这是一种创新的混合方法,将 DistilRoBERTa 预训练模型的特征提取能力与图神经网络(GNN)的动态情感分类能力相结合。我们全面的四阶段数据预处理策略旨在利用特定领域的高质量数据丰富模型训练。在本研究中,我们分析了四个公开的基准数据集:Rest14、Rest15、Rest16-EN 和 Rest16-ESP,以严格评估我们的新型 DistilRoBERTa2GNN 模型在 ABSA 中的有效性。在 Rest14 数据集中,我们的模型获得了 77.98% 的 F1 分数、78.12% 的精确度和 79.41% 的召回率。在 Rest15 数据集上,我们的模型获得了 76.86% 的 F1 分数、80.70% 的精确度和 79.37% 的召回率。对于 Rest16-EN 数据集,我们模型的 F1 得分为 84.96%,精确度为 82.77%,召回率为 87.28%。对于 Rest16-ESP(西班牙数据集),我们模型的 F1 得分为 74.87%,精确度为 73.11%,召回率为 76.80%。这些指标凸显了我们的模型与 ABSA 研究中使用的不同基线模型相比所具有的竞争优势。这项研究解决了 ABSA 面临的关键挑战,为情感分析研究树立了一个新的标杆,为今后在不同数据集上提高模型的适应性和性能提供了指导。
{"title":"Distilroberta2gnn: a new hybrid deep learning approach for aspect-based sentiment analysis","authors":"Aseel Alhadlaq, Alaa Altheneyan","doi":"10.7717/peerj-cs.2267","DOIUrl":"https://doi.org/10.7717/peerj-cs.2267","url":null,"abstract":"In the field of natural language processing (NLP), aspect-based sentiment analysis (ABSA) is crucial for extracting insights from complex human sentiments towards specific text aspects. Despite significant progress, the field still faces challenges such as accurately interpreting subtle language nuances and the scarcity of high-quality, domain-specific annotated datasets. This study introduces the Distil- RoBERTa2GNN model, an innovative hybrid approach that combines the DistilRoBERTa pre-trained model’s feature extraction capabilities with the dynamic sentiment classification abilities of graph neural networks (GNN). Our comprehensive, four-phase data preprocessing strategy is designed to enrich model training with domain-specific, high-quality data. In this study, we analyze four publicly available benchmark datasets: Rest14, Rest15, Rest16-EN, and Rest16-ESP, to rigorously evaluate the effectiveness of our novel DistilRoBERTa2GNN model in ABSA. For the Rest14 dataset, our model achieved an F1 score of 77.98%, precision of 78.12%, and recall of 79.41%. The Rest15 dataset shows that our model achieves an F1 score of 76.86%, precision of 80.70%, and recall of 79.37%. For the Rest16-EN dataset, our model reached an F1 score of 84.96%, precision of 82.77%, and recall of 87.28%. For Rest16-ESP (Spanish dataset), our model achieved an F1 score of 74.87%, with a precision of 73.11% and a recall of 76.80%. These metrics highlight our model’s competitive edge over different baseline models used in ABSA studies. This study addresses critical ABSA challenges and sets a new benchmark for sentiment analysis research, guiding future efforts toward enhancing model adaptability and performance across diverse datasets.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"168 1","pages":""},"PeriodicalIF":3.8,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pairing algorithm for varying data in cluster based heterogeneous wireless sensor networks 基于集群的异构无线传感器网络中不同数据的配对算法
IF 3.8 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-16 DOI: 10.7717/peerj-cs.2243
Zahida Shaheen, Kashif Sattar, Mukhtar Ahmed
In wireless sensor networks (WSNs), clustering is employed to extend the network’s lifespan. Each cluster has a designated cluster head. Pairing is another technique used within clustering to enhance network longevity. In this technique, nodes are grouped into pairs, with one node in an active state and the other in a sleep state to conserve energy. However, this pairing can lead to communication issues with the cluster head, as nodes in sleep mode cannot transmit data, potentially causing data loss. To address this issue, this study introduces an innovative approach called the “Awake Sleep Heterogeneous Nodes’ Pairing” (ASHNP) algorithm. This algorithm aims to improve transmission efficiency in WSNs operating in heterogeneous environments. In contrast, Energy Efficient Sleep Awake Aware (EESAA) algorithm are customized for homogeneous environments (EESAA), while suitable for homogeneous settings, encounters challenges in handling data loss from sleep nodes. On the other hand, Energy and Traffic Aware Sleep Awake (ETASA) struggles with listening problems, limiting its efficiency in diverse environments. Through comprehensive comparative analysis, ASHNP demonstrates higher performance in data transmission efficiency, overcoming the shortcomings of EESAA and ETASA. Additionally, comparisons across various parameters, including energy consumption and the number of dead nodes, highlight ASHNP’s effectiveness in enhancing network reliability and resource utilization. These findings underscore the significance of ASHNP as a promising solution for optimizing data transmission in WSNs, particularly in heterogeneous environments. The analysis discloses that ASHNP reliably outperforms EESAA in maintaining node energy, with differences ranging from 1.5% to 10% across various rounds. Specifically, ASHNP achieves a data transmission rate 5.23% higher than EESAA and 21.73% higher than ETASA. These findings underscore the strength of ASHNP in sustaining node activity levels, showcasing its superiority in preserving network integrity and ensuring efficient data transmission across multiple rounds.
在无线传感器网络(WSN)中,聚类被用来延长网络的寿命。每个簇都有一个指定的簇头。配对是聚类中另一种用于延长网络寿命的技术。在这种技术中,节点会被成对分组,其中一个节点处于活动状态,另一个节点处于休眠状态,以节省能量。然而,这种配对可能会导致与簇头的通信问题,因为处于睡眠模式的节点无法传输数据,有可能造成数据丢失。为解决这一问题,本研究引入了一种创新方法,即 "唤醒睡眠异构节点配对"(ASHNP)算法。该算法旨在提高在异构环境中运行的 WSN 的传输效率。相比之下,高能效睡眠觉醒(EESAA)算法是为同质环境定制的(EESAA),虽然适合同质环境,但在处理睡眠节点的数据丢失方面遇到了挑战。另一方面,能量和流量感知睡眠唤醒算法(ETASA)则在监听问题上苦苦挣扎,限制了其在不同环境中的效率。通过综合比较分析,ASHNP 在数据传输效率方面表现出更高的性能,克服了 EESAA 和 ETASA 的缺点。此外,通过对各种参数(包括能耗和死节点数量)的比较,ASHNP 在提高网络可靠性和资源利用率方面的有效性也得到了凸显。这些发现强调了 ASHNP 作为优化 WSN(尤其是异构环境中的 WSN)数据传输的一种有前途的解决方案的重要性。分析表明,ASHNP 在保持节点能量方面可靠地优于 EESAA,在不同轮次中的差异从 1.5% 到 10% 不等。具体来说,ASHNP 的数据传输速率比 EESAA 高 5.23%,比 ETASA 高 21.73%。这些发现强调了 ASHNP 在维持节点活动水平方面的优势,展示了它在维护网络完整性和确保多轮高效数据传输方面的优越性。
{"title":"Pairing algorithm for varying data in cluster based heterogeneous wireless sensor networks","authors":"Zahida Shaheen, Kashif Sattar, Mukhtar Ahmed","doi":"10.7717/peerj-cs.2243","DOIUrl":"https://doi.org/10.7717/peerj-cs.2243","url":null,"abstract":"In wireless sensor networks (WSNs), clustering is employed to extend the network’s lifespan. Each cluster has a designated cluster head. Pairing is another technique used within clustering to enhance network longevity. In this technique, nodes are grouped into pairs, with one node in an active state and the other in a sleep state to conserve energy. However, this pairing can lead to communication issues with the cluster head, as nodes in sleep mode cannot transmit data, potentially causing data loss. To address this issue, this study introduces an innovative approach called the “Awake Sleep Heterogeneous Nodes’ Pairing” (ASHNP) algorithm. This algorithm aims to improve transmission efficiency in WSNs operating in heterogeneous environments. In contrast, Energy Efficient Sleep Awake Aware (EESAA) algorithm are customized for homogeneous environments (EESAA), while suitable for homogeneous settings, encounters challenges in handling data loss from sleep nodes. On the other hand, Energy and Traffic Aware Sleep Awake (ETASA) struggles with listening problems, limiting its efficiency in diverse environments. Through comprehensive comparative analysis, ASHNP demonstrates higher performance in data transmission efficiency, overcoming the shortcomings of EESAA and ETASA. Additionally, comparisons across various parameters, including energy consumption and the number of dead nodes, highlight ASHNP’s effectiveness in enhancing network reliability and resource utilization. These findings underscore the significance of ASHNP as a promising solution for optimizing data transmission in WSNs, particularly in heterogeneous environments. The analysis discloses that ASHNP reliably outperforms EESAA in maintaining node energy, with differences ranging from 1.5% to 10% across various rounds. Specifically, ASHNP achieves a data transmission rate 5.23% higher than EESAA and 21.73% higher than ETASA. These findings underscore the strength of ASHNP in sustaining node activity levels, showcasing its superiority in preserving network integrity and ensuring efficient data transmission across multiple rounds.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"2 1","pages":""},"PeriodicalIF":3.8,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic stacking ensemble for cross-language code smell detection 跨语言代码气味检测的动态堆叠组合
IF 3.8 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-15 DOI: 10.7717/peerj-cs.2254
Hamoud Aljamaan
Code smells refer to poor design and implementation choices by software engineers that might affect the overall software quality. Code smells detection using machine learning models has become a popular area to build effective models that are capable of detecting different code smells in multiple programming languages. However, the process of building of such effective models has not reached a state of stability, and most of the existing research focuses on Java code smells detection. The main objective of this article is to propose dynamic ensembles using two strategies, namely greedy search and backward elimination, which are capable of accurately detecting code smells in two programming languages (i.e., Java and Python), and which are less complex than full stacking ensembles. The detection performance of dynamic ensembles were investigated within the context of four Java and two Python code smells. The greedy search and backward elimination strategies yielded different base models lists to build dynamic ensembles. In comparison to full stacking ensembles, dynamic ensembles yielded less complex models when they were used to detect most of the investigated Java and Python code smells, with the backward elimination strategy resulting in less complex models. Dynamic ensembles were able to perform comparably against full stacking ensembles with no significant detection loss. This article concludes that dynamic stacking ensembles were able to facilitate the effective and stable detection performance of Java and Python code smells over all base models and with less complexity than full stacking ensembles.
代码气味是指软件工程师在设计和实施过程中做出的不良选择,可能会影响软件的整体质量。使用机器学习模型检测代码气味已成为一个热门领域,以建立能够检测多种编程语言中不同代码气味的有效模型。然而,建立此类有效模型的过程尚未达到稳定状态,现有研究大多集中于 Java 代码气味检测。本文的主要目的是提出使用贪婪搜索和后向消除两种策略的动态集合,它们能够准确地检测两种编程语言(即 Java 和 Python)中的代码气味,而且其复杂性低于完全堆叠集合。在四种 Java 和两种 Python 代码气味的背景下,研究了动态集合的检测性能。贪婪搜索和后向消除策略产生了不同的基础模型列表来构建动态集合。与完全堆叠集合相比,动态集合在用于检测大部分被调查的 Java 和 Python 代码气味时,所产生的模型复杂度较低,后向消除策略所产生的模型复杂度较低。动态集合与完全堆叠集合的性能相当,没有明显的检测损失。本文的结论是,在所有基本模型中,动态堆叠集合都能有效、稳定地检测出 Java 和 Python 代码气味,而且复杂度低于完全堆叠集合。
{"title":"Dynamic stacking ensemble for cross-language code smell detection","authors":"Hamoud Aljamaan","doi":"10.7717/peerj-cs.2254","DOIUrl":"https://doi.org/10.7717/peerj-cs.2254","url":null,"abstract":"Code smells refer to poor design and implementation choices by software engineers that might affect the overall software quality. Code smells detection using machine learning models has become a popular area to build effective models that are capable of detecting different code smells in multiple programming languages. However, the process of building of such effective models has not reached a state of stability, and most of the existing research focuses on Java code smells detection. The main objective of this article is to propose dynamic ensembles using two strategies, namely greedy search and backward elimination, which are capable of accurately detecting code smells in two programming languages (i.e., Java and Python), and which are less complex than full stacking ensembles. The detection performance of dynamic ensembles were investigated within the context of four Java and two Python code smells. The greedy search and backward elimination strategies yielded different base models lists to build dynamic ensembles. In comparison to full stacking ensembles, dynamic ensembles yielded less complex models when they were used to detect most of the investigated Java and Python code smells, with the backward elimination strategy resulting in less complex models. Dynamic ensembles were able to perform comparably against full stacking ensembles with no significant detection loss. This article concludes that dynamic stacking ensembles were able to facilitate the effective and stable detection performance of Java and Python code smells over all base models and with less complexity than full stacking ensembles.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"24 1","pages":""},"PeriodicalIF":3.8,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel approach to secure communication in mega events through Arabic text steganography utilizing invisible Unicode characters 利用隐形 Unicode 字符进行阿拉伯语文本隐写术,在大型活动中实现安全通信的新方法
IF 3.8 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-15 DOI: 10.7717/peerj-cs.2236
Esam Ali Khan
Mega events attract mega crowds, and many data exchange transactions are involved among organizers, stakeholders, and individuals, which increase the risk of covert eavesdropping. Data hiding is essential for safeguarding the security, confidentiality, and integrity of information during mega events. It plays a vital role in reducing cyber risks and ensuring the seamless execution of these extensive gatherings. In this paper, a steganographic approach suitable for mega events communication is proposed. The proposed method utilizes the characteristics of Arabic letters and invisible Unicode characters to hide secret data, where each Arabic letter can hide two secret bits. The secret messages hidden using the proposed technique can be exchanged via emails, text messages, and social media, as these are the main communication channels in mega events. The proposed technique demonstrated notable performance with a high-capacity ratio averaging 178% and a perfect imperceptibility ratio of 100%, outperforming most of the previous work. In addition, it proves a performance of security comparable to previous approaches, with an average ratio of 72%. Furthermore, it is better in robustness than all related work, with a robustness against 70% of the possible attacks.
大型活动吸引了大量人群,组织者、利益相关者和个人之间涉及许多数据交换交易,这增加了秘密窃听的风险。数据隐藏对于保障大型活动期间信息的安全性、保密性和完整性至关重要。它在降低网络风险和确保这些大型集会的无缝执行方面发挥着至关重要的作用。本文提出了一种适用于特大活动通信的隐写方法。所提出的方法利用阿拉伯字母和隐形 Unicode 字符的特点来隐藏秘密数据,其中每个阿拉伯字母可以隐藏两个秘密比特。使用所提出的技术隐藏的秘密信息可以通过电子邮件、文本信息和社交媒体进行交换,因为这些是大型活动中的主要通信渠道。所提出的技术具有显著的性能,其平均高容量率为 178%,完美不可察觉率为 100%,优于之前的大多数研究成果。此外,它还证明了与以往方法相当的安全性能,平均比率为 72%。此外,它的鲁棒性也优于所有相关研究,可抵御 70% 的可能攻击。
{"title":"A novel approach to secure communication in mega events through Arabic text steganography utilizing invisible Unicode characters","authors":"Esam Ali Khan","doi":"10.7717/peerj-cs.2236","DOIUrl":"https://doi.org/10.7717/peerj-cs.2236","url":null,"abstract":"Mega events attract mega crowds, and many data exchange transactions are involved among organizers, stakeholders, and individuals, which increase the risk of covert eavesdropping. Data hiding is essential for safeguarding the security, confidentiality, and integrity of information during mega events. It plays a vital role in reducing cyber risks and ensuring the seamless execution of these extensive gatherings. In this paper, a steganographic approach suitable for mega events communication is proposed. The proposed method utilizes the characteristics of Arabic letters and invisible Unicode characters to hide secret data, where each Arabic letter can hide two secret bits. The secret messages hidden using the proposed technique can be exchanged via emails, text messages, and social media, as these are the main communication channels in mega events. The proposed technique demonstrated notable performance with a high-capacity ratio averaging 178% and a perfect imperceptibility ratio of 100%, outperforming most of the previous work. In addition, it proves a performance of security comparable to previous approaches, with an average ratio of 72%. Furthermore, it is better in robustness than all related work, with a robustness against 70% of the possible attacks.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"6 1","pages":""},"PeriodicalIF":3.8,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A pre-averaged pseudo nearest neighbor classifier 预平均伪近邻分类器
IF 3.8 4区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-13 DOI: 10.7717/peerj-cs.2247
Dapeng Li
The k-nearest neighbor algorithm is a powerful classification method. However, its classification performance will be affected in small-size samples with existing outliers. To address this issue, a pre-averaged pseudo nearest neighbor classifier (PAPNN) is proposed to improve classification performance. In the PAPNN rule, the pre-averaged categorical vectors are calculated by taking the average of any two points of the training sets in each class. Then, k-pseudo nearest neighbors are chosen from the preprocessed vectors of every class to determine the category of a query point. The pre-averaged vectors can reduce the negative impact of outliers to some degree. Extensive experiments are conducted on nineteen numerical real data sets and three high dimensional real data sets by comparing PAPNN to other twelve classification methods. The experimental results demonstrate that the proposed PAPNN rule is effective for classification tasks in the case of small-size samples with existing outliers.
K 近邻算法是一种强大的分类方法。然而,在存在异常值的小样本中,其分类性能会受到影响。为了解决这个问题,我们提出了一种预平均伪近邻分类器(PAPNN)来提高分类性能。在 PAPNN 规则中,预平均分类向量是通过取每个类别中训练集中任意两个点的平均值来计算的。然后,从每一类的预处理向量中选择 k 个伪近邻来确定查询点的类别。预平均向量可以在一定程度上减少异常值的负面影响。通过将 PAPNN 与其他 12 种分类方法进行比较,在 19 个数值真实数据集和 3 个高维真实数据集上进行了大量实验。实验结果表明,所提出的 PAPNN 规则对于存在离群值的小尺寸样本的分类任务是有效的。
{"title":"A pre-averaged pseudo nearest neighbor classifier","authors":"Dapeng Li","doi":"10.7717/peerj-cs.2247","DOIUrl":"https://doi.org/10.7717/peerj-cs.2247","url":null,"abstract":"The k-nearest neighbor algorithm is a powerful classification method. However, its classification performance will be affected in small-size samples with existing outliers. To address this issue, a pre-averaged pseudo nearest neighbor classifier (PAPNN) is proposed to improve classification performance. In the PAPNN rule, the pre-averaged categorical vectors are calculated by taking the average of any two points of the training sets in each class. Then, k-pseudo nearest neighbors are chosen from the preprocessed vectors of every class to determine the category of a query point. The pre-averaged vectors can reduce the negative impact of outliers to some degree. Extensive experiments are conducted on nineteen numerical real data sets and three high dimensional real data sets by comparing PAPNN to other twelve classification methods. The experimental results demonstrate that the proposed PAPNN rule is effective for classification tasks in the case of small-size samples with existing outliers.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"49 1","pages":""},"PeriodicalIF":3.8,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
PeerJ Computer Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1