PeerJ Computer Science最新文献_第9页

Comparative analysis of deep learning algorithms for dental caries detection and prediction from radiographic images: a comprehensive umbrella review. 深度学习算法在龋齿检测和放射影像预测中的比较分析：综合综述。

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science

Pub Date : 2024-11-12 eCollection Date: 2024-01-01 DOI: 10.7717/peerj-cs.2371

Mahmood Dashti, Jimmy Londono, Shohreh Ghasemi, Niusha Zare, Meyassara Samman, Heba Ashi, Mohammad Hosein Amirzade-Iranaq, Farshad Khosraviani, Mohammad Sabeti, Zohaib Khurshid

Background: In recent years, artificial intelligence (AI) and deep learning (DL) have made a considerable impact in dentistry, specifically in advancing image processing algorithms for detecting caries from radiographical images. Despite this progress, there is still a lack of data on the effectiveness of these algorithms in accurately identifying caries. This study provides an overview aimed at evaluating and comparing reviews that focus on the detection of dental caries (DC) using DL algorithms from 2D radiographs.

Materials and methods: This comprehensive umbrella review adhered to the "Reporting guideline for overviews of reviews of healthcare interventions" (PRIOR). Specific keywords were generated to assess the accuracy of AI and DL algorithms in detecting DC from radiographical images. To ensure the highest quality of research, thorough searches were performed on PubMed/Medline, Web of Science, Scopus, and Embase. Additionally, bias in the selected articles was rigorously assessed using the Joanna Briggs Institute (JBI) tool.

Results: In this umbrella review, seven systematic reviews (SRs) were assessed from a total of 77 studies included. Various DL algorithms were used across these studies, with conventional neural networks and other techniques being the predominant methods for detecting DC. The SRs included in the study examined 24 original articles that used 2D radiographical images for caries detection. Accuracy rates varied between 0.733 and 0.986 across datasets ranging in size from 15 to 2,500 images.

Conclusion: The advancement of DL algorithms in detecting and predicting DC through radiographic imaging is a significant breakthrough. These algorithms excel in extracting subtle features from radiographic images and applying machine learning techniques to achieve highly accurate predictions, often outperforming human experts. This advancement holds immense potential to transform diagnostic processes in dentistry, promising to considerably improve patient outcomes.

背景：近年来，人工智能（AI）和深度学习（DL）在牙科领域产生了相当大的影响，特别是在改进图像处理算法以从放射图像中检测龋齿方面。尽管取得了这些进展，但这些算法在准确识别龋齿方面的有效性仍然缺乏数据。本研究提供了一个概述，旨在评估和比较的评论，重点是利用二维x线片的DL算法检测龋齿（DC）。材料和方法：本综合综述遵循“卫生保健干预措施综述报告指南”（PRIOR）。生成了特定的关键词来评估AI和DL算法在从放射图像中检测DC时的准确性。为了确保研究的最高质量，我们在PubMed/Medline、Web of Science、Scopus和Embase上进行了全面的搜索。此外，使用乔安娜布里格斯研究所（JBI）工具严格评估所选文章的偏倚。结果：在这一总括性综述中，共纳入77项研究，评估了7项系统评价（SRs）。在这些研究中使用了各种深度学习算法，传统的神经网络和其他技术是检测深度学习的主要方法。研究中包括的SRs检查了24篇使用二维放射图像检测龋齿的原始文章。在15到2500张图像的数据集上，准确率在0.733到0.986之间变化。结论：通过x线影像检测和预测DC的DL算法的进步是一项重大突破。这些算法擅长于从放射图像中提取细微特征，并应用机器学习技术实现高度准确的预测，通常优于人类专家。这一进步在改变牙科诊断过程方面具有巨大的潜力，有望大大改善患者的治疗效果。

{"title":"Comparative analysis of deep learning algorithms for dental caries detection and prediction from radiographic images: a comprehensive umbrella review.","authors":"Mahmood Dashti, Jimmy Londono, Shohreh Ghasemi, Niusha Zare, Meyassara Samman, Heba Ashi, Mohammad Hosein Amirzade-Iranaq, Farshad Khosraviani, Mohammad Sabeti, Zohaib Khurshid","doi":"10.7717/peerj-cs.2371","DOIUrl":"10.7717/peerj-cs.2371","url":null,"abstract":"Background: In recent years, artificial intelligence (AI) and deep learning (DL) have made a considerable impact in dentistry, specifically in advancing image processing algorithms for detecting caries from radiographical images. Despite this progress, there is still a lack of data on the effectiveness of these algorithms in accurately identifying caries. This study provides an overview aimed at evaluating and comparing reviews that focus on the detection of dental caries (DC) using DL algorithms from 2D radiographs.Materials and methods: This comprehensive umbrella review adhered to the \"Reporting guideline for overviews of reviews of healthcare interventions\" (PRIOR). Specific keywords were generated to assess the accuracy of AI and DL algorithms in detecting DC from radiographical images. To ensure the highest quality of research, thorough searches were performed on PubMed/Medline, Web of Science, Scopus, and Embase. Additionally, bias in the selected articles was rigorously assessed using the Joanna Briggs Institute (JBI) tool.Results: In this umbrella review, seven systematic reviews (SRs) were assessed from a total of 77 studies included. Various DL algorithms were used across these studies, with conventional neural networks and other techniques being the predominant methods for detecting DC. The SRs included in the study examined 24 original articles that used 2D radiographical images for caries detection. Accuracy rates varied between 0.733 and 0.986 across datasets ranging in size from 15 to 2,500 images.Conclusion: The advancement of DL algorithms in detecting and predicting DC through radiographic imaging is a significant breakthrough. These algorithms excel in extracting subtle features from radiographic images and applying machine learning techniques to achieve highly accurate predictions, often outperforming human experts. This advancement holds immense potential to transform diagnostic processes in dentistry, promising to considerably improve patient outcomes.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2371"},"PeriodicalIF":3.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11622875/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142802801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automated programming approaches to enhance computer-aided translation accuracy. 自动化编程方法提高计算机辅助翻译的准确性。

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science

Pub Date : 2024-11-12 eCollection Date: 2024-01-01 DOI: 10.7717/peerj-cs.2396

Tao Zhao, Mazni Binti Alias

With the continued development of information technology and increased global cultural exchanges, translation has gained significant attention. Traditional manual translation relies heavily on dictionaries or personal experience, translating word by word. While this method ensures high translation quality, it is often too slow to meet the demands of today's fast-paced environment. Computer-assisted translation (CAT) addresses the issue of slow translation speed; however, the quality of CAT translations still requires rigorous evaluation. This study aims to answer the following questions: How do CAT systems that use automated programming fare compared to more conventional methods of human translation when translating English vocabulary? (2) How can CAT systems be improved to handle difficult English words, specialised terminology, and semantic subtleties? The working premise is that CAT systems that use automated programming techniques will outperform traditional methods in terms of translation accuracy. English vocabulary plays a crucial role in translation, as words can have different meanings depending on the context. CAT systems improve their translation accuracy by utilising specific automated programs and building a translation corpus through translation memory technology. This study compares the accuracy of English vocabulary translations produced by CAT based on automatic programming with those produced by traditional manual translation. Experimental results demonstrate that CAT based on automatic programming is 8% more accurate than traditional manual translation when dealing with complex English vocabulary sentences, professional jargon, English acronyms, and semantic nuances. Consequently, compared to conventional human translation, CAT can enhance the accuracy of English vocabulary translation, making it a valuable tool in the translation industry.

随着信息技术的不断发展和全球文化交流的增加，翻译越来越受到人们的重视。传统的手工翻译严重依赖词典或个人经验，逐字翻译。虽然这种方法保证了高质量的翻译，但往往太慢，无法满足当今快节奏环境的要求。计算机辅助翻译（CAT）解决了翻译速度慢的问题；然而，CAT翻译的质量仍然需要严格的评估。本研究旨在回答以下问题：在翻译英语词汇时，与更传统的人工翻译方法相比，使用自动编程的CAT系统表现如何？(2)如何改进CAT系统来处理英语难词、专业术语和语义的微妙之处？工作前提是使用自动化编程技术的CAT系统在翻译准确性方面优于传统方法。英语词汇在翻译中起着至关重要的作用，因为单词可以根据上下文具有不同的含义。计算机辅助翻译系统利用特定的自动化程序和翻译记忆库技术来提高翻译的准确性。本研究比较了基于自动编程的计算机辅助翻译与传统人工翻译的英语词汇翻译的准确性。实验结果表明，在处理复杂的英语词汇句、专业术语、英文缩略语和语义差异时，基于自动编程的CAT比传统人工翻译的准确率提高了8%。因此，与传统的人工翻译相比，CAT可以提高英语词汇翻译的准确性，使其成为翻译行业中有价值的工具。

{"title":"Automated programming approaches to enhance computer-aided translation accuracy.","authors":"Tao Zhao, Mazni Binti Alias","doi":"10.7717/peerj-cs.2396","DOIUrl":"10.7717/peerj-cs.2396","url":null,"abstract":"With the continued development of information technology and increased global cultural exchanges, translation has gained significant attention. Traditional manual translation relies heavily on dictionaries or personal experience, translating word by word. While this method ensures high translation quality, it is often too slow to meet the demands of today's fast-paced environment. Computer-assisted translation (CAT) addresses the issue of slow translation speed; however, the quality of CAT translations still requires rigorous evaluation. This study aims to answer the following questions: How do CAT systems that use automated programming fare compared to more conventional methods of human translation when translating English vocabulary? (2) How can CAT systems be improved to handle difficult English words, specialised terminology, and semantic subtleties? The working premise is that CAT systems that use automated programming techniques will outperform traditional methods in terms of translation accuracy. English vocabulary plays a crucial role in translation, as words can have different meanings depending on the context. CAT systems improve their translation accuracy by utilising specific automated programs and building a translation corpus through translation memory technology. This study compares the accuracy of English vocabulary translations produced by CAT based on automatic programming with those produced by traditional manual translation. Experimental results demonstrate that CAT based on automatic programming is 8% more accurate than traditional manual translation when dealing with complex English vocabulary sentences, professional jargon, English acronyms, and semantic nuances. Consequently, compared to conventional human translation, CAT can enhance the accuracy of English vocabulary translation, making it a valuable tool in the translation industry.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2396"},"PeriodicalIF":3.5,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623011/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging machine learning for the detection of structured interference in Global Navigation Satellite Systems. 利用机器学习检测全球导航卫星系统中的结构化干扰。

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science

Pub Date : 2024-11-11 eCollection Date: 2024-01-01 DOI: 10.7717/peerj-cs.2399

Imtiaz Nabi, Salma Zainab Farooq, Sunnyaha Saeed, Syed Ali Irtaza, Khurram Shehzad, Mohammad Arif, Inayat Khan, Shafiq Ahmad

Radio frequency interference disrupts services offered by Global Navigation Satellite Systems (GNSS). Spoofing is the transmission of structured interference signals intended to deceive GNSS location and timing services. The identification of spoofing is vital, especially for safety-of-life aviation services, since the receiver is unaware of counterfeit signals. Although numerous spoofing detection and mitigation techniques have been developed, spoofing attacks are becoming more sophisticated, limiting most of these methods. This study explores the application of machine learning techniques for discerning authentic signals from counterfeit ones. The investigation particularly focuses on the secure code estimation and replay (SCER) spoofing attack, one of the most challenging type of spoofing attacks, ds8 scenario of the Texas Spoofing Test Battery (TEXBAT) dataset. The proposed framework uses tracking data from delay lock loop correlators as intrinsic features to train four distinct machine learning (ML) models: logistic regression, support vector machines (SVM) classifier, K-nearest neighbors (KNN), and decision tree. The models are trained employing a random six-fold cross-validation methodology. It can be observed that both logistic regression and SVM can detect spoofing with a mean F1-score of 94%. However, logistic regression provides 165dB gain in terms of time efficiency as compared to SVM and 3 better than decision tree-based classifier. These performance metrics as well as receiver operating characteristic curve analysis make logistic regression the desirable approach for identifying SCER structured interference.

无线电频率干扰干扰全球导航卫星系统（GNSS）提供的服务。欺骗是一种旨在欺骗GNSS定位和授时服务的结构化干扰信号的传输。识别欺骗是至关重要的，特别是对于生命安全的航空服务，因为接收器不知道伪造信号。尽管已经开发了许多欺骗检测和缓解技术，但欺骗攻击正变得越来越复杂，限制了大多数这些方法。本研究探讨了机器学习技术在辨别真伪信号中的应用。调查特别关注安全代码估计和重放（SCER）欺骗攻击，这是最具挑战性的欺骗攻击类型之一，是德克萨斯欺骗测试电池（TEXBAT）数据集的ds8场景。该框架使用来自延迟锁环相关器的跟踪数据作为内在特征来训练四种不同的机器学习（ML）模型：逻辑回归、支持向量机（SVM）分类器、k近邻（KNN）和决策树。这些模型采用随机的六倍交叉验证方法进行训练。可以观察到，逻辑回归和支持向量机都可以检测欺骗，平均f1得分为94%。然而，与支持向量机相比，逻辑回归在时间效率方面提供了165dB的增益，比基于决策树的分类器更好。这些性能指标以及接收机工作特性曲线分析使逻辑回归成为识别SCER结构化干扰的理想方法。

{"title":"Leveraging machine learning for the detection of structured interference in Global Navigation Satellite Systems.","authors":"Imtiaz Nabi, Salma Zainab Farooq, Sunnyaha Saeed, Syed Ali Irtaza, Khurram Shehzad, Mohammad Arif, Inayat Khan, Shafiq Ahmad","doi":"10.7717/peerj-cs.2399","DOIUrl":"10.7717/peerj-cs.2399","url":null,"abstract":"Radio frequency interference disrupts services offered by Global Navigation Satellite Systems (GNSS). Spoofing is the transmission of structured interference signals intended to deceive GNSS location and timing services. The identification of spoofing is vital, especially for safety-of-life aviation services, since the receiver is unaware of counterfeit signals. Although numerous spoofing detection and mitigation techniques have been developed, spoofing attacks are becoming more sophisticated, limiting most of these methods. This study explores the application of machine learning techniques for discerning authentic signals from counterfeit ones. The investigation particularly focuses on the secure code estimation and replay (SCER) spoofing attack, one of the most challenging type of spoofing attacks, ds8 scenario of the Texas Spoofing Test Battery (TEXBAT) dataset. The proposed framework uses tracking data from delay lock loop correlators as intrinsic features to train four distinct machine learning (ML) models: logistic regression, support vector machines (SVM) classifier, K-nearest neighbors (KNN), and decision tree. The models are trained employing a random six-fold cross-validation methodology. It can be observed that both logistic regression and SVM can detect spoofing with a mean F1-score of 94%. However, logistic regression provides 165dB gain in terms of time efficiency as compared to SVM and 3 better than decision tree-based classifier. These performance metrics as well as receiver operating characteristic curve analysis make logistic regression the desirable approach for identifying SCER structured interference.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2399"},"PeriodicalIF":3.5,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623110/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unsupervised learning analysis on the proteomes of Zika virus. 寨卡病毒蛋白质组学的无监督学习分析。

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science

Pub Date : 2024-11-11 eCollection Date: 2024-01-01 DOI: 10.7717/peerj-cs.2443

Edgar E Lara-Ramírez, Gildardo Rivera, Amanda Alejandra Oliva-Hernández, Virgilio Bocanegra-Garcia, Jesús Adrián López, Xianwu Guo

Background: The Zika virus (ZIKV), which is transmitted by mosquito vectors to nonhuman primates and humans, causes devastating outbreaks in the poorest tropical regions of the world. Molecular epidemiology, supported by clustering phylogenetic gold standard studies using sequence data, has provided valuable information for tracking and controlling the spread of ZIKV. Unsupervised learning (UL), a form of machine learning algorithm, can be applied on the datasets without the need of known information for training.

Methods: In this work, unsupervised Random Forest (URF), followed by the application of dimensional reduction algorithms such as principal component analysis (PCA), Uniform Manifold Approximation and Projection (UMAP), t-distributed stochastic neighbor embedding (t-SNE), and autoencoders were used to uncover hidden patterns from polymorphic amino acid sites extracted on the proteome ZIKV multi-alignments, without the need of an underlying evolutionary model.

Results: The four UL algorithms revealed specific host and geographical clustering patterns for ZIKV. Among the four dimensionality reduction (DR) algorithms, the performance was better for UMAP. The four algorithms allowed the identification of imported viruses for specific geographical clusters. The UL dimension coordinates showed a significant correlation with phylogenetic tree branch lengths and significant phylogenetic dependence in Abouheif's Cmean and Pagel's Lambda tests (p value < 0.01) that showed comparable performance with the phylogenetic method. This analytical strategy was generalizable to an external large dengue type 2 dataset.

Conclusion: These UL algorithms could be practical evolutionary analytical techniques to track the dispersal of viral pathogens.

背景：寨卡病毒（ZIKV）通过蚊子媒介传播给非人类灵长类动物和人类，在世界最贫穷的热带地区造成毁灭性的疫情。分子流行病学在利用序列数据进行聚类系统发育金标准研究的支持下，为追踪和控制寨卡病毒的传播提供了有价值的信息。无监督学习（UL）是一种机器学习算法，可以在不需要已知信息的情况下应用于数据集进行训练。方法：采用无监督随机森林（unsupervised Random Forest， URF），然后应用降维算法，如主成分分析（PCA）、均匀流形逼近和投影（UMAP）、t分布随机邻居嵌入（t-SNE）和自动编码器，在不需要潜在进化模型的情况下，从蛋白质组ZIKV多序列上提取的多态性氨基酸位点中发现隐藏模式。结果：4种UL算法揭示了寨卡病毒特定的宿主和地理聚类模式。在四种降维算法中，UMAP算法的性能较好。这四种算法可以识别特定地理集群的输入病毒。在Abouheif's Cmean和Pagel's Lambda检验中，UL维坐标与系统发育树分支长度显著相关，且系统发育依赖性显著（p值< 0.01），与系统发育方法表现出相当的性能。该分析策略可推广到外部大型2型登革热数据集。结论：该算法可作为跟踪病毒病原体传播的实用进化分析技术。

{"title":"Unsupervised learning analysis on the proteomes of Zika virus.","authors":"Edgar E Lara-Ramírez, Gildardo Rivera, Amanda Alejandra Oliva-Hernández, Virgilio Bocanegra-Garcia, Jesús Adrián López, Xianwu Guo","doi":"10.7717/peerj-cs.2443","DOIUrl":"10.7717/peerj-cs.2443","url":null,"abstract":"Background: The Zika virus (ZIKV), which is transmitted by mosquito vectors to nonhuman primates and humans, causes devastating outbreaks in the poorest tropical regions of the world. Molecular epidemiology, supported by clustering phylogenetic gold standard studies using sequence data, has provided valuable information for tracking and controlling the spread of ZIKV. Unsupervised learning (UL), a form of machine learning algorithm, can be applied on the datasets without the need of known information for training.Methods: In this work, unsupervised Random Forest (URF), followed by the application of dimensional reduction algorithms such as principal component analysis (PCA), Uniform Manifold Approximation and Projection (UMAP), t-distributed stochastic neighbor embedding (t-SNE), and autoencoders were used to uncover hidden patterns from polymorphic amino acid sites extracted on the proteome ZIKV multi-alignments, without the need of an underlying evolutionary model.Results: The four UL algorithms revealed specific host and geographical clustering patterns for ZIKV. Among the four dimensionality reduction (DR) algorithms, the performance was better for UMAP. The four algorithms allowed the identification of imported viruses for specific geographical clusters. The UL dimension coordinates showed a significant correlation with phylogenetic tree branch lengths and significant phylogenetic dependence in Abouheif's Cmean and Pagel's Lambda tests (p value < 0.01) that showed comparable performance with the phylogenetic method. This analytical strategy was generalizable to an external large dengue type 2 dataset.Conclusion: These UL algorithms could be practical evolutionary analytical techniques to track the dispersal of viral pathogens.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2443"},"PeriodicalIF":3.5,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623125/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A comparative analysis of variants of machine learning and time series models in predicting women's participation in the labor force. 机器学习和时间序列模型在预测女性劳动力参与方面的比较分析。

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science

Pub Date : 2024-11-11 eCollection Date: 2024-01-01 DOI: 10.7717/peerj-cs.2430

Rasha Elstohy, Nevein Aneis, Eman Mounir Ali

Labor force participation of Egyptian women has been a chronic economic problem in Egypt. Despite the improvement in the human capital front, whether on the education or health indicators, female labor force participation remains persistently low. This study proposes a hybrid machine-learning model that integrates principal component analysis (PCA) for feature extraction with various machine learning and time-series models to predict women's employment in times of crisis. Various machine learning (ML) algorithms, such as support vector machine (SVM), neural network, K-nearest neighbor (KNN), linear regression, random forest, and AdaBoost, in addition to popular time series algorithms, including autoregressive integrated moving average (ARIMA) and vector autoregressive (VAR) models, have been applied to an actual dataset from the public sector. The manpower dataset considered gender from different regions, ages, and educational levels. The dataset was then trained, tested, and evaluated. For performance validation, forecasting accuracy metrics were constructed using mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), mean absolute percent error (MAPE), R-squared (R2), and cross-validated root mean squared error (CVRMSE). Another Dickey-Fuller test was performed to evaluate and compare the accuracy of the applied models, and the results showed that AdaBoost outperforms the other methods by an accuracy of 100%. Compared to alternative works, our findings demonstrate a comprehensive comparative analysis for predicting women's participation in different regions during an economic crisis.

埃及妇女参加劳动一直是埃及的一个长期经济问题。尽管人力资本方面有所改善，无论是在教育还是健康指标方面，但女性劳动力参与率仍然很低。本研究提出了一种混合机器学习模型，该模型将用于特征提取的主成分分析（PCA）与各种机器学习和时间序列模型相结合，以预测女性在危机时期的就业情况。各种机器学习（ML）算法，如支持向量机（SVM）、神经网络、k近邻（KNN）、线性回归、随机森林和AdaBoost，以及流行的时间序列算法，包括自回归综合移动平均（ARIMA）和向量自回归（VAR）模型，已应用于公共部门的实际数据集。人力数据集考虑了不同地区、年龄和教育水平的性别。然后对数据集进行训练、测试和评估。为了进行性能验证，使用均方误差（MSE）、均方根误差（RMSE）、平均绝对误差（MAE）、平均绝对百分比误差（MAPE）、r平方（R2）和交叉验证的均方根误差（CVRMSE）构建预测精度指标。进行另一个Dickey-Fuller测试来评估和比较应用模型的准确性，结果表明AdaBoost优于其他方法，准确率为100%。与其他研究相比，我们的研究结果为预测经济危机期间不同地区的妇女参与情况提供了全面的比较分析。

{"title":"A comparative analysis of variants of machine learning and time series models in predicting women's participation in the labor force.","authors":"Rasha Elstohy, Nevein Aneis, Eman Mounir Ali","doi":"10.7717/peerj-cs.2430","DOIUrl":"10.7717/peerj-cs.2430","url":null,"abstract":"Labor force participation of Egyptian women has been a chronic economic problem in Egypt. Despite the improvement in the human capital front, whether on the education or health indicators, female labor force participation remains persistently low. This study proposes a hybrid machine-learning model that integrates principal component analysis (PCA) for feature extraction with various machine learning and time-series models to predict women's employment in times of crisis. Various machine learning (ML) algorithms, such as support vector machine (SVM), neural network, K-nearest neighbor (KNN), linear regression, random forest, and AdaBoost, in addition to popular time series algorithms, including autoregressive integrated moving average (ARIMA) and vector autoregressive (VAR) models, have been applied to an actual dataset from the public sector. The manpower dataset considered gender from different regions, ages, and educational levels. The dataset was then trained, tested, and evaluated. For performance validation, forecasting accuracy metrics were constructed using mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), mean absolute percent error (MAPE), R-squared (R2), and cross-validated root mean squared error (CVRMSE). Another Dickey-Fuller test was performed to evaluate and compare the accuracy of the applied models, and the results showed that AdaBoost outperforms the other methods by an accuracy of 100%. Compared to alternative works, our findings demonstrate a comprehensive comparative analysis for predicting women's participation in different regions during an economic crisis.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2430"},"PeriodicalIF":3.5,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11622832/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive learning algorithm based price prediction model for auction lots-deep clustering based interval quoting. 基于自适应学习算法的拍品价格预测模型——基于深度聚类的区间报价。

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science

Pub Date : 2024-11-07 eCollection Date: 2024-01-01 DOI: 10.7717/peerj-cs.2412

Da Ke, Xianhua Fan, Muhammad Asif

This article addresses the problem of interval pricing for auction items by constructing an auction item price prediction model based on an adaptive learning algorithm. Firstly, considering the confusing class characteristics of auction item prices, a dynamic inter-class distance adaptive learning model is developed to identify confusing classes by calculating the differences in prediction values across multiple classifiers for target domain samples. The difference in the predicted values of the target domain samples on multiple classifiers is used to calculate the classification distance, distinguish the confusing classes, and make the similar samples in the target domain more clustered. Secondly, a deep clustering algorithm is constructed, which integrates the temporal characteristics and numerical differences of auction item prices, using DTW-K-medoids based dynamic time warping (DTW) and fuzzy C-means (FCM) algorithms for fine clustering. Finally, the KF-LSTM auction item interval price prediction model is constructed using long short-term memory (LSTM) and dual clustering. Experimental results show that the proposed KF-LSTM model significantly improves the prediction accuracy of auction item prices during fluctuation periods, with an average accuracy rate of 90.23% and an average MAPE of only 5.41%. Additionally, under confidence levels of 80%, 85%, and 90%, the KF-LSTM model achieves an interval coverage rate of over 85% for actual auction item prices, significantly enhancing the accuracy of auction item price predictions. This experiment demonstrates the stability and accuracy of the proposed model when applied to different sets of auction items, providing a valuable reference for research in the auction item price prediction field.

本文通过构建一个基于自适应学习算法的拍卖品价格预测模型，解决了拍卖品区间定价问题。首先，考虑拍卖品价格的混淆类特征，建立了一种动态类间距离自适应学习模型，通过计算目标域样本中多个分类器预测值的差异来识别混淆类；目标域样本在多个分类器上的预测值之差用于计算分类距离，区分混淆的类，使目标域内相似的样本更加聚类。其次，利用基于DTW- k - mediids的动态时间规整（DTW）和模糊c均值（FCM）算法进行精细聚类，构建了一种整合拍卖物品价格时间特征和数值差异的深度聚类算法；最后，利用长短期记忆（LSTM）和双聚类技术构建了KF-LSTM拍卖项目区间价格预测模型。实验结果表明，所提出的KF-LSTM模型显著提高了波动期拍卖物品价格的预测精度，平均准确率为90.23%，平均MAPE仅为5.41%。此外，在80%、85%和90%的置信水平下，KF-LSTM模型对实际拍卖物品价格的区间覆盖率超过85%，显著提高了拍卖物品价格预测的准确性。实验验证了该模型在不同拍品集上的稳定性和准确性，为拍品价格预测领域的研究提供了有价值的参考。

{"title":"Adaptive learning algorithm based price prediction model for auction lots-deep clustering based interval quoting.","authors":"Da Ke, Xianhua Fan, Muhammad Asif","doi":"10.7717/peerj-cs.2412","DOIUrl":"10.7717/peerj-cs.2412","url":null,"abstract":"This article addresses the problem of interval pricing for auction items by constructing an auction item price prediction model based on an adaptive learning algorithm. Firstly, considering the confusing class characteristics of auction item prices, a dynamic inter-class distance adaptive learning model is developed to identify confusing classes by calculating the differences in prediction values across multiple classifiers for target domain samples. The difference in the predicted values of the target domain samples on multiple classifiers is used to calculate the classification distance, distinguish the confusing classes, and make the similar samples in the target domain more clustered. Secondly, a deep clustering algorithm is constructed, which integrates the temporal characteristics and numerical differences of auction item prices, using DTW-K-medoids based dynamic time warping (DTW) and fuzzy C-means (FCM) algorithms for fine clustering. Finally, the KF-LSTM auction item interval price prediction model is constructed using long short-term memory (LSTM) and dual clustering. Experimental results show that the proposed KF-LSTM model significantly improves the prediction accuracy of auction item prices during fluctuation periods, with an average accuracy rate of 90.23% and an average MAPE of only 5.41%. Additionally, under confidence levels of 80%, 85%, and 90%, the KF-LSTM model achieves an interval coverage rate of over 85% for actual auction item prices, significantly enhancing the accuracy of auction item price predictions. This experiment demonstrates the stability and accuracy of the proposed model when applied to different sets of auction items, providing a valuable reference for research in the auction item price prediction field.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2412"},"PeriodicalIF":3.5,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623274/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142802976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Genome-wide association studies of ischemic stroke based on interpretable machine learning. 基于可解释机器学习的缺血性中风全基因组关联研究。

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science

Pub Date : 2024-11-06 eCollection Date: 2024-01-01 DOI: 10.7717/peerj-cs.2454

Stefan Nikolić, Dmitry I Ignatov, Gennady V Khvorykh, Svetlana A Limborska, Andrey V Khrunin

Despite the identification of several dozen genetic loci associated with ischemic stroke (IS), the genetic bases of this disease remain largely unexplored. In this research we present the results of genome-wide association studies (GWAS) based on classical statistical testing and machine learning algorithms (logistic regression, gradient boosting on decision trees, and tabular deep learning model TabNet). To build a consensus on the results obtained by different techniques, the Pareto-Optimal solution was proposed and applied. These methods were applied to real genotypic data of sick and healthy individuals of European ancestry obtained from the Database of Genotypes and Phenotypes (5,581 individuals, 883,749 single nucleotide polymorphisms). Finally, 131 genes were identified as candidates for association with the onset of IS. UBQLN1, TRPS1, and MUSK were previously described as associated with the course of IS in model animals. ACOT11 taking part in metabolism of fatty acids was shown for the first time to be associated with IS. The identified genes were compared with genes from the Illuminating Druggable Genome project. The product of GPR26 representing the G-coupled protein receptor can be considered as a therapeutic target for stroke prevention. The approaches presented in this research can be used to reprocess GWAS datasets from other diseases.

尽管已经确定了几十个与缺血性卒中（IS）相关的基因位点，但这种疾病的遗传基础仍未被充分探索。在本研究中，我们介绍了基于经典统计测试和机器学习算法（逻辑回归、决策树梯度增强和表格深度学习模型TabNet）的全基因组关联研究（GWAS）的结果。为了使不同方法得到的结果具有一致性，提出并应用了帕累托最优解。这些方法应用于从基因型和表型数据库中获得的欧洲血统患病和健康个体的真实基因型数据（5,581人，883,749个单核苷酸多态性）。最后，131个基因被确定为与IS发病相关的候选基因。UBQLN1、TRPS1和MUSK先前被描述为与模型动物的IS病程相关。参与脂肪酸代谢的ACOT11首次被证实与IS有关。将所鉴定的基因与illumiabledruggable Genome计划中的基因进行比较。GPR26的产物代表g偶联蛋白受体，可被认为是预防脑卒中的治疗靶点。本研究中提出的方法可用于重新处理来自其他疾病的GWAS数据集。

{"title":"Genome-wide association studies of ischemic stroke based on interpretable machine learning.","authors":"Stefan Nikolić, Dmitry I Ignatov, Gennady V Khvorykh, Svetlana A Limborska, Andrey V Khrunin","doi":"10.7717/peerj-cs.2454","DOIUrl":"10.7717/peerj-cs.2454","url":null,"abstract":"Despite the identification of several dozen genetic loci associated with ischemic stroke (IS), the genetic bases of this disease remain largely unexplored. In this research we present the results of genome-wide association studies (GWAS) based on classical statistical testing and machine learning algorithms (logistic regression, gradient boosting on decision trees, and tabular deep learning model TabNet). To build a consensus on the results obtained by different techniques, the Pareto-Optimal solution was proposed and applied. These methods were applied to real genotypic data of sick and healthy individuals of European ancestry obtained from the Database of Genotypes and Phenotypes (5,581 individuals, 883,749 single nucleotide polymorphisms). Finally, 131 genes were identified as candidates for association with the onset of IS. UBQLN1, TRPS1, and MUSK were previously described as associated with the course of IS in model animals. ACOT11 taking part in metabolism of fatty acids was shown for the first time to be associated with IS. The identified genes were compared with genes from the Illuminating Druggable Genome project. The product of GPR26 representing the G-coupled protein receptor can be considered as a therapeutic target for stroke prevention. The approaches presented in this research can be used to reprocess GWAS datasets from other diseases.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2454"},"PeriodicalIF":3.5,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623107/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An automated AI-powered IoT algorithm with data processing and noise elimination for plant monitoring and actuating. 一种自动化的人工智能驱动的物联网算法，具有数据处理和噪声消除功能，用于工厂监控和启动。

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science

Pub Date : 2024-11-06 eCollection Date: 2024-01-01 DOI: 10.7717/peerj-cs.2448

Mohammed A H Ali, Khaja Moiduddin, Yusoff Nukman, Bushroa Abd Razak, Mohamed K Aboudaif, Muthuramalingam Thangaraj

This article aims to develop a novel Artificial Intelligence-powered Internet of Things (AI-powered IoT) system that can automatically monitor the conditions of the plant (crop) and apply the necessary action without human interaction. The system can remotely send a report on the plant conditions to the farmers through IoT, enabling them for tracking the healthiness of plants. Chili plant has been selected to test the proposed AI-powered IoT monitoring and actuating system as it is so sensitive to the soil moisture, weather changes and can be attacked by several types of diseases. The structure of the proposed system is passed through five main stages, namely, AI-powered IoT system design, prototype fabrication, signal and image processing, noise elimination and proposed system testing. The prototype for monitoring is equipped with multiple sensors, namely, soil moisture, carbon dioxide (CO₂) detector, temperature, and camera sensors, which are utilized to continuously monitor the conditions of the plant. Several signal and image processing operations have been applied on the acquired sensors data to prepare them for further post-processing stage. In the post processing step, a new AI based noise elimination algorithm has been introduced to eliminate the noise in the images and take the right actions which are performed using actuators such as pumps, fans to make the necessary actions. The experimental results show that the prototype is functioning well with the proposed AI-powered IoT algorithm, where the water pump, exhausted fan and pesticide pump are actuated when the sensors detect a low moisture level, high CO₂ concentration level, and video processing-based pests' detection, respectively. The results also show that the algorithm is capable to detect the pests on the leaves with 75% successful rate.

本文旨在开发一种新型的人工智能驱动的物联网（AI-powered IoT）系统，该系统可以自动监控植物（作物）的状况，并在没有人工交互的情况下应用必要的行动。该系统可以通过物联网远程向农民发送植物状况报告，使他们能够跟踪植物的健康状况。辣椒植物对土壤湿度、天气变化非常敏感，而且可能受到多种疾病的侵袭，因此被选中来测试拟议的人工智能驱动的物联网监测和驱动系统。该系统的结构经过了五个主要阶段，即人工智能驱动的物联网系统设计、原型制作、信号和图像处理、噪声消除和系统测试。监测原型配备了多个传感器，即土壤湿度、二氧化碳（CO2）探测器、温度和摄像传感器，用于持续监测植物的状况。对采集到的传感器数据进行了一些信号和图像处理操作，为进一步的后处理阶段做准备。在后处理步骤中，引入了一种新的基于人工智能的消噪算法来消除图像中的噪声，并采取正确的动作，这些动作是使用泵、风扇等执行器来执行的。实验结果表明，该原型与所提出的基于ai的物联网算法运行良好，当传感器检测到低湿度、高二氧化碳浓度和基于视频处理的害虫检测时，分别启动水泵、排气风扇和农药泵。结果还表明，该算法能够以75%的成功率检测出叶子上的害虫。

{"title":"An automated AI-powered IoT algorithm with data processing and noise elimination for plant monitoring and actuating.","authors":"Mohammed A H Ali, Khaja Moiduddin, Yusoff Nukman, Bushroa Abd Razak, Mohamed K Aboudaif, Muthuramalingam Thangaraj","doi":"10.7717/peerj-cs.2448","DOIUrl":"https://doi.org/10.7717/peerj-cs.2448","url":null,"abstract":"This article aims to develop a novel Artificial Intelligence-powered Internet of Things (AI-powered IoT) system that can automatically monitor the conditions of the plant (crop) and apply the necessary action without human interaction. The system can remotely send a report on the plant conditions to the farmers through IoT, enabling them for tracking the healthiness of plants. Chili plant has been selected to test the proposed AI-powered IoT monitoring and actuating system as it is so sensitive to the soil moisture, weather changes and can be attacked by several types of diseases. The structure of the proposed system is passed through five main stages, namely, AI-powered IoT system design, prototype fabrication, signal and image processing, noise elimination and proposed system testing. The prototype for monitoring is equipped with multiple sensors, namely, soil moisture, carbon dioxide (CO2) detector, temperature, and camera sensors, which are utilized to continuously monitor the conditions of the plant. Several signal and image processing operations have been applied on the acquired sensors data to prepare them for further post-processing stage. In the post processing step, a new AI based noise elimination algorithm has been introduced to eliminate the noise in the images and take the right actions which are performed using actuators such as pumps, fans to make the necessary actions. The experimental results show that the prototype is functioning well with the proposed AI-powered IoT algorithm, where the water pump, exhausted fan and pesticide pump are actuated when the sensors detect a low moisture level, high CO2 concentration level, and video processing-based pests' detection, respectively. The results also show that the algorithm is capable to detect the pests on the leaves with 75% successful rate.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2448"},"PeriodicalIF":3.5,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623071/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Positive Unlabeled Learning Selected Not At Random (PULSNAR): class proportion estimation without the selected completely at random assumption. Positive unlabelled Learning Selected Not At Random (PULSNAR)：没有完全随机选择假设的类比例估计。

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science

Pub Date : 2024-11-05 eCollection Date: 2024-01-01 DOI: 10.7717/peerj-cs.2451

Praveen Kumar, Christophe G Lambert

Positive and unlabeled (PU) learning is a type of semi-supervised binary classification where the machine learning algorithm differentiates between a set of positive instances (labeled) and a set of both positive and negative instances (unlabeled). PU learning has broad applications in settings where confirmed negatives are unavailable or difficult to obtain, and there is value in discovering positives among the unlabeled (e.g., viable drugs among untested compounds). Most PU learning algorithms make the selected completely at random (SCAR) assumption, namely that positives are selected independently of their features. However, in many real-world applications, such as healthcare, positives are not SCAR (e.g., severe cases are more likely to be diagnosed), leading to a poor estimate of the proportion, α, of positives among unlabeled examples and poor model calibration, resulting in an uncertain decision threshold for selecting positives. PU learning algorithms vary; some estimate only the proportion, α, of positives in the unlabeled set, while others calculate the probability that each specific unlabeled instance is positive, and some can do both. We propose two PU learning algorithms to estimate α, calculate calibrated probabilities for PU instances, and improve classification metrics: i) PULSCAR (positive unlabeled learning selected completely at random), and ii) PULSNAR (positive unlabeled learning selected not at random). PULSNAR employs a divide-and-conquer approach to cluster SNAR positives into subtypes and estimates α for each subtype by applying PULSCAR to positives from each cluster and all unlabeled. In our experiments, PULSNAR outperformed state-of-the-art approaches on both synthetic and real-world benchmark datasets.

正未标记（PU）学习是一种半监督二元分类，其中机器学习算法区分一组正实例（标记）和一组正负实例（未标记）。PU学习在无法获得确定的阴性或难以获得的环境中具有广泛的应用，并且在未标记的（例如，未经测试的化合物中的可行药物）中发现阳性是有价值的。大多数PU学习算法都做出了完全随机选择（SCAR）的假设，即阳性的选择与它们的特征无关。然而，在许多现实世界的应用中，如医疗保健，阳性不是SCAR（例如，严重病例更有可能被诊断出来），导致对未标记示例中阳性比例α的估计较差，模型校准较差，导致选择阳性的决策阈值不确定。PU学习算法各不相同；有些只估计未标记集合中阳性的比例α，而另一些则计算每个特定未标记实例为阳性的概率，有些可以两者兼而有之。我们提出了两种PU学习算法来估计α，计算PU实例的校准概率，并改进分类指标：i) PULSCAR（完全随机选择的正无标记学习）和ii) PULSNAR（非随机选择的正无标记学习）。PULSNAR采用分而治之的方法将SNAR阳性聚类为亚型，并通过将PULSCAR应用于每个聚类和所有未标记的阳性聚类来估计每个亚型的α。在我们的实验中，PULSNAR在合成和真实基准数据集上都优于最先进的方法。

{"title":"Positive Unlabeled Learning Selected Not At Random (PULSNAR): class proportion estimation without the selected completely at random assumption.","authors":"Praveen Kumar, Christophe G Lambert","doi":"10.7717/peerj-cs.2451","DOIUrl":"10.7717/peerj-cs.2451","url":null,"abstract":"Positive and unlabeled (PU) learning is a type of semi-supervised binary classification where the machine learning algorithm differentiates between a set of positive instances (labeled) and a set of both positive and negative instances (unlabeled). PU learning has broad applications in settings where confirmed negatives are unavailable or difficult to obtain, and there is value in discovering positives among the unlabeled (e.g., viable drugs among untested compounds). Most PU learning algorithms make the selected completely at random (SCAR) assumption, namely that positives are selected independently of their features. However, in many real-world applications, such as healthcare, positives are not SCAR (e.g., severe cases are more likely to be diagnosed), leading to a poor estimate of the proportion, α, of positives among unlabeled examples and poor model calibration, resulting in an uncertain decision threshold for selecting positives. PU learning algorithms vary; some estimate only the proportion, α, of positives in the unlabeled set, while others calculate the probability that each specific unlabeled instance is positive, and some can do both. We propose two PU learning algorithms to estimate α, calculate calibrated probabilities for PU instances, and improve classification metrics: i) PULSCAR (positive unlabeled learning selected completely at random), and ii) PULSNAR (positive unlabeled learning selected not at random). PULSNAR employs a divide-and-conquer approach to cluster SNAR positives into subtypes and estimates α for each subtype by applying PULSCAR to positives from each cluster and all unlabeled. In our experiments, PULSNAR outperformed state-of-the-art approaches on both synthetic and real-world benchmark datasets.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2451"},"PeriodicalIF":3.5,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11622864/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mobile app review analysis for crowdsourcing of software requirements: a mapping study of automated and semi-automated tools. 软件需求众包的移动应用审查分析：自动化和半自动化工具的映射研究。

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science

Pub Date : 2024-11-05 eCollection Date: 2024-01-01 DOI: 10.7717/peerj-cs.2401

Rhodes Massenon, Ishaya Gambo, Roseline Oluwaseun Ogundokun, Ezekiel Adebayo Ogundepo, Sweta Srivastava, Saurabh Agarwal, Wooguil Pak

Mobile app reviews are valuable for gaining user feedback on features, usability, and areas for improvement. Analyzing these reviews manually is difficult due to volume and structure, leading to the need for automated techniques. This mapping study categorizes existing approaches for automated and semi-automated tools by analyzing 180 primary studies. Techniques include topic modeling, collocation finding, association rule-based, aspect-based sentiment analysis, frequency-based, word vector-based, and hybrid approaches. The study compares various tools for analyzing mobile app reviews based on performance, scalability, and user-friendliness. Tools like KEFE, MERIT, DIVER, SAFER, SIRA, T-FEX, RE-BERT, and AOBTM outperformed baseline tools like IDEA and SAFE in identifying emerging issues and extracting relevant information. The study also discusses limitations such as manual intervention, linguistic complexities, scalability issues, and interpretability challenges in incorporating user feedback. Overall, this mapping study outlines the current state of feature extraction from app reviews, suggesting future research and innovation opportunities for extracting software requirements from mobile app reviews, thereby improving mobile app development.

手机应用评论对于获取用户关于功能、可用性和改进领域的反馈非常有价值。由于数量和结构的原因，手动分析这些审查是困难的，这导致需要自动化技术。本研究通过分析180项主要研究，对现有的自动化和半自动化工具方法进行了分类。技术包括主题建模、搭配查找、基于关联规则、基于方面的情感分析、基于频率的、基于词向量的和混合方法。该研究比较了基于性能、可扩展性和用户友好性分析手机应用评论的各种工具。KEFE、MERIT、DIVER、SAFER、SIRA、T-FEX、RE-BERT和AOBTM等工具在识别新出现的问题和提取相关信息方面优于IDEA和SAFE等基准工具。该研究还讨论了人工干预、语言复杂性、可扩展性问题以及合并用户反馈时的可解释性挑战等局限性。总体而言，该地图研究概述了从应用评论中提取特征的现状，为从移动应用评论中提取软件需求提出了未来的研究和创新机会，从而改善了移动应用开发。

{"title":"Mobile app review analysis for crowdsourcing of software requirements: a mapping study of automated and semi-automated tools.","authors":"Rhodes Massenon, Ishaya Gambo, Roseline Oluwaseun Ogundokun, Ezekiel Adebayo Ogundepo, Sweta Srivastava, Saurabh Agarwal, Wooguil Pak","doi":"10.7717/peerj-cs.2401","DOIUrl":"10.7717/peerj-cs.2401","url":null,"abstract":"Mobile app reviews are valuable for gaining user feedback on features, usability, and areas for improvement. Analyzing these reviews manually is difficult due to volume and structure, leading to the need for automated techniques. This mapping study categorizes existing approaches for automated and semi-automated tools by analyzing 180 primary studies. Techniques include topic modeling, collocation finding, association rule-based, aspect-based sentiment analysis, frequency-based, word vector-based, and hybrid approaches. The study compares various tools for analyzing mobile app reviews based on performance, scalability, and user-friendliness. Tools like KEFE, MERIT, DIVER, SAFER, SIRA, T-FEX, RE-BERT, and AOBTM outperformed baseline tools like IDEA and SAFE in identifying emerging issues and extracting relevant information. The study also discusses limitations such as manual intervention, linguistic complexities, scalability issues, and interpretability challenges in incorporating user feedback. Overall, this mapping study outlines the current state of feature extraction from app reviews, suggesting future research and innovation opportunities for extracting software requirements from mobile app reviews, thereby improving mobile app development.","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2401"},"PeriodicalIF":3.5,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623114/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142803376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0